当前位置：首页 > news >正文

Phi-4-Reasoning-Vision代码实例：TextIteratorStreamer流式解析实现

news 2026/7/27 9:54:49

Phi-4-Reasoning-Vision代码实例：TextIteratorStreamer流式解析实现

1. 项目概述

Phi-4-Reasoning-Vision是基于微软Phi-4-reasoning-vision-15B多模态大模型开发的高性能推理工具，专为双卡RTX 4090环境优化。该工具严格遵循官方SYSTEM PROMPT规范，支持THINK/NOTHINK双推理模式、图文多模态输入、流式输出与思考过程折叠展示等功能。

1.1 核心特性

双卡并行优化：自动将15B模型拆分至两张4090显卡
精准Prompt适配：严格遵循官方SYSTEM PROMPT要求
流式输出解析：实现逐字流式输出与思考过程分离
多模态输入支持：同时处理图片上传和文本提问
专业级交互界面：通过Streamlit搭建宽屏交互界面

2. 环境准备与部署

2.1 硬件要求

两张NVIDIA RTX 4090显卡
至少64GB系统内存
推荐使用Ubuntu 20.04/22.04系统

2.2 软件依赖安装

pip install torch==2.1.0 transformers==4.33.0 streamlit==1.25.0 pip install accelerate bitsandbytes

2.3 模型下载与配置

from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "microsoft/phi-4-reasoning-vision-15B" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, device_map="auto", torch_dtype=torch.bfloat16 )

3. TextIteratorStreamer流式解析实现

3.1 流式输出基础实现

from transformers import TextIteratorStreamer from threading import Thread def generate_stream_response(prompt, model, tokenizer): streamer = TextIteratorStreamer(tokenizer) generation_kwargs = { "input_ids": tokenizer(prompt, return_tensors="pt").input_ids.to("cuda"), "streamer": streamer, "max_new_tokens": 512 } thread = Thread(target=model.generate, kwargs=generation_kwargs) thread.start() for new_text in streamer: yield new_text

3.2 THINK/NOTHINK模式解析

def parse_think_mode(text_stream): think_content = [] final_answer = [] in_think_block = False for token in text_stream: if "<think>" in token: in_think_block = True think_content.append(token.replace("<think>", "")) elif "</think>" in token: in_think_block = False think_content.append(token.replace("</think>", "")) else: if in_think_block: think_content.append(token) else: final_answer.append(token) return "".join(think_content), "".join(final_answer)

3.3 多模态输入处理

from PIL import Image def process_multimodal_input(image_path, question): image = Image.open(image_path) inputs = tokenizer( question, return_tensors="pt", images=image ).to("cuda:0") return inputs

4. 完整推理流程实现

4.1 推理主函数

def run_inference(image_path, question, think_mode=True): # 处理多模态输入 inputs = process_multimodal_input(image_path, question) # 构建系统提示 system_prompt = "<|system|>\n" if think_mode: system_prompt += "You are an AI assistant that thinks step by step." else: system_prompt += "You are an AI assistant that answers directly." # 生成完整提示 full_prompt = system_prompt + "\n<|user|>\n" + question + "\n<|assistant|>\n" # 流式生成响应 streamer = TextIteratorStreamer(tokenizer) generation_kwargs = dict( **inputs, streamer=streamer, max_new_tokens=1024 ) thread = Thread(target=model.generate, kwargs=generation_kwargs) thread.start() # 解析流式输出 if think_mode: think_content, final_answer = parse_think_mode(streamer) return think_content, final_answer else: return "".join([text for text in streamer]), None

4.2 双卡负载均衡优化

def balance_gpu_load(): model = AutoModelForCausalLM.from_pretrained( "microsoft/phi-4-reasoning-vision-15B", device_map={ "": 0, "model.layers.0": 0, "model.layers.1": 0, # ... 中间层均匀分配 ... "model.layers.30": 1, "model.layers.31": 1, "lm_head": 1 }, torch_dtype=torch.bfloat16 ) return model

5. Streamlit交互界面实现

5.1 界面布局设计

import streamlit as st def setup_ui(): st.set_page_config(layout="wide") st.title("Phi-4-Reasoning-Vision 多模态推理工具") col1, col2 = st.columns([1, 2]) with col1: st.header("参数配置") uploaded_file = st.file_uploader("上传一张图片以供分析", type=["jpg", "png"]) question = st.text_area("提出你的问题", height=100) think_mode = st.checkbox("启用THINK模式", value=True) with col2: st.header("推理结果") if uploaded_file: st.image(uploaded_file, caption="上传的图片", use_column_width=True) return uploaded_file, question, think_mode

5.2 实时结果显示

def show_results(think_content, final_answer, think_mode): if think_mode and think_content: with st.expander("思考过程"): st.write(think_content) if final_answer: st.subheader("最终回答") st.write(final_answer)