当前位置：首页 > news >正文

Phi-4-mini-reasoning Chainlit实战教程：自定义UI+后端vLLM无缝对接

news 2026/6/7 17:20:48

Phi-4-mini-reasoning Chainlit实战教程：自定义UI+后端vLLM无缝对接

1. 环境准备与快速部署

在开始之前，我们需要确保系统环境满足基本要求。Phi-4-mini-reasoning是一个轻量级但功能强大的开源模型，支持128K令牌的上下文长度，特别适合需要长文本处理的推理任务。

1.1 系统要求

操作系统：Linux (推荐Ubuntu 20.04或更高版本)
Python版本：3.8或更高
GPU：至少16GB显存 (推荐NVIDIA A10G或更高)
内存：32GB或更高
存储空间：至少50GB可用空间

1.2 快速安装vLLM后端

使用以下命令安装vLLM及其依赖：

pip install vllm pip install torch --extra-index-url https://download.pytorch.org/whl/cu118

安装完成后，可以通过以下命令验证vLLM是否安装成功：

python -c "import vllm; print(vllm.__version__)"

2. 部署Phi-4-mini-reasoning模型

2.1 下载模型权重

Phi-4-mini-reasoning是一个开源模型，可以从Hugging Face模型库获取：

git lfs install git clone https://huggingface.co/username/Phi-4-mini-reasoning

2.2 启动vLLM服务

使用vLLM部署模型服务非常简单，只需运行以下命令：

python -m vllm.entrypoints.api_server \ --model /path/to/Phi-4-mini-reasoning \ --tensor-parallel-size 1 \ --gpu-memory-utilization 0.9 \ --max-num-seqs 256 \ --max-model-len 128000

2.3 验证服务状态

服务启动后，可以通过webshell查看日志确认部署状态：

cat /root/workspace/llm.log

如果看到类似以下输出，表示服务已成功启动：

INFO 07-01 15:30:12 llm_engine.py:72] Initializing an LLM engine with config... INFO 07-01 15:30:15 llm_engine.py:150] KV cache size: 10.00 GB INFO 07-01 15:30:15 llm_engine.py:151] # GPU blocks: 1024 INFO 07-01 15:30:15 api_server.py:120] Started server process [1234]

3. Chainlit前端集成

Chainlit是一个强大的Python库，可以快速构建AI应用的交互式界面。下面我们将详细介绍如何将Chainlit与vLLM后端对接。

3.1 安装Chainlit

首先安装Chainlit及其依赖：

pip install chainlit

3.2 创建Chainlit应用

创建一个新的Python文件app.py，添加以下基础代码：

import chainlit as cl import requests # 配置vLLM服务器地址 VLLM_SERVER = "http://localhost:8000" @cl.on_chat_start async def start_chat(): await cl.Message(content="Phi-4-mini-reasoning已就绪，请输入您的问题...").send() @cl.on_message async def main(message: str): # 准备请求数据 data = { "prompt": message, "max_tokens": 1024, "temperature": 0.7, "top_p": 0.9 } # 发送请求到vLLM服务器 response = requests.post(f"{VLLM_SERVER}/generate", json=data) if response.status_code == 200: result = response.json() await cl.Message(content=result["text"][0]).send() else: await cl.Message(content=f"请求失败: {response.text}").send()

3.3 自定义UI界面

Chainlit支持高度自定义的UI界面。我们可以通过修改app.py来增强用户体验：

# 在app.py中添加以下代码 @cl.on_chat_start async def init_ui(): # 设置应用标题和描述 await cl.ChatSettings( [ cl.input_widget.Slider( id="temperature", label="创造性", initial=0.7, min=0, max=1, step=0.1, ), cl.input_widget.Slider( id="max_tokens", label="最大长度", initial=1024, min=64, max=4096, step=64, ), ] ).send()

4. 完整功能实现

4.1 增强版Chainlit应用

下面是一个功能更完整的Chainlit实现，支持对话历史、参数调整和流式响应：

import chainlit as cl import aiohttp import json from typing import Optional, Dict VLLM_SERVER = "http://localhost:8000" @cl.on_chat_start async def start_chat(): settings = await cl.ChatSettings( [ cl.input_widget.Slider(id="temperature", label="创造性", initial=0.7, min=0, max=1, step=0.1), cl.input_widget.Slider(id="max_tokens", label="最大长度", initial=1024, min=64, max=4096, step=64), cl.input_widget.Select( id="mode", label="模式", values=["creative", "balanced", "precise"], initial_index=1, ), ] ).send() await cl.Message( content="""# Phi-4-mini-reasoning 交互界面 欢迎使用Phi-4-mini-reasoning推理引擎，这是一个专注于高质量推理的轻量级模型。 您可以在右侧调整参数以获得最佳体验。""" ).send() @cl.on_message async def main(message: str): settings = cl.user_session.get("chat_settings") # 根据模式调整参数 if settings["mode"] == "creative": temperature = min(settings["temperature"] + 0.2, 1.0) top_p = 0.95 elif settings["mode"] == "precise": temperature = max(settings["temperature"] - 0.2, 0.1) top_p = 0.7 else: temperature = settings["temperature"] top_p = 0.9 # 准备请求数据 data = { "prompt": message, "max_tokens": settings["max_tokens"], "temperature": temperature, "top_p": top_p, "stream": True } # 创建消息对象用于流式响应 msg = cl.Message(content="") await msg.send() # 使用aiohttp进行流式请求 async with aiohttp.ClientSession() as session: async with session.post( f"{VLLM_SERVER}/generate", json=data, headers={"Content-Type": "application/json"}, ) as resp: if resp.status != 200: await msg.update(content=f"请求失败: {await resp.text()}") return async for chunk in resp.content: if chunk: chunk_str = chunk.decode("utf-8") try: chunk_json = json.loads(chunk_str) text = chunk_json["text"][0] await msg.stream_token(text) except json.JSONDecodeError: continue await msg.update()

4.2 运行Chainlit应用

使用以下命令启动Chainlit应用：

chainlit run app.py -w

启动后，在浏览器中访问http://localhost:8000即可看到交互界面。

5. 高级功能与优化

5.1 添加对话历史

为了提供更连贯的对话体验，我们可以添加对话历史功能：

@cl.on_chat_start async def start_chat(): cl.user_session.set("conversation_history", []) # ...其余初始化代码... @cl.on_message async def main(message: str): history = cl.user_session.get("conversation_history") history.append({"role": "user", "content": message}) # 构建包含历史的prompt prompt = "\n".join([f"{msg['role']}: {msg['content']}" for msg in history[-6:]]) prompt += "\nassistant:" # ...其余请求代码... # 将响应添加到历史 async for chunk in resp.content: if chunk: # ...处理chunk... history.append({"role": "assistant", "content": text}) cl.user_session.set("conversation_history", history)