当前位置：首页 > news >正文

Phi-3-mini-128k-instruct实战教程：基于vLLM API封装REST接口供Web端调用

news 2026/4/20 5:14:15

Phi-3-mini-128k-instruct实战教程：基于vLLM API封装REST接口供Web端调用

1. 模型简介

Phi-3-Mini-128K-Instruct是一个38亿参数的轻量级开放模型，属于Phi-3系列的最新成员。这个模型经过精心训练，特别擅长理解和执行各种指令任务。

模型的主要特点包括：

支持长达128K tokens的上下文处理能力
在常识推理、语言理解、数学计算和编程任务上表现优异
经过监督微调和直接偏好优化，确保响应质量和安全性
相比同类模型，在参数规模小于130亿的模型中性能领先

2. 环境准备与部署验证

2.1 检查模型服务状态

部署完成后，可以通过以下命令验证服务是否正常运行：

cat /root/workspace/llm.log

如果看到类似下面的输出，说明模型已成功加载并准备好接收请求：

Loading model weights... Model loaded successfully vLLM API server started on port 8000

2.2 使用Chainlit进行初步测试

Chainlit提供了一个简单的前端界面，可以快速测试模型功能。

2.2.1 启动Chainlit界面

在终端运行以下命令启动Chainlit：

chainlit run app.py

这将打开一个本地Web界面，您可以直接与模型交互。

2.2.2 测试模型响应

在Chainlit界面中输入问题，例如： "请用简单的语言解释量子计算的基本概念"

模型应该会返回一个结构清晰、易于理解的回答，展示其指令遵循能力。

3. 封装REST API接口

3.1 创建FastAPI应用

我们将使用FastAPI来封装vLLM的原始API，提供更友好的REST接口。

首先安装必要的依赖：

pip install fastapi uvicorn requests

然后创建api_server.py文件：

from fastapi import FastAPI from pydantic import BaseModel import requests app = FastAPI() class PromptRequest(BaseModel): prompt: str max_tokens: int = 512 temperature: float = 0.7 @app.post("/generate") async def generate_text(request: PromptRequest): vllm_url = "http://localhost:8000/v1/completions" headers = {"Content-Type": "application/json"} payload = { "prompt": request.prompt, "max_tokens": request.max_tokens, "temperature": request.temperature } response = requests.post(vllm_url, json=payload, headers=headers) return response.json()

3.2 启动API服务

运行以下命令启动FastAPI服务：

uvicorn api_server:app --host 0.0.0.0 --port 5000

现在您可以通过http://localhost:5000/generate访问封装后的API。

4. Web前端集成

4.1 创建简单的前端页面

创建一个index.html文件：

<!DOCTYPE html> <html> <head> <title>Phi-3 Mini 交互界面</title> <style> body { font-family: Arial, sans-serif; max-width: 800px; margin: 0 auto; padding: 20px; } #response { margin-top: 20px; white-space: pre-wrap; } textarea { width: 100%; height: 100px; } button { margin-top: 10px; padding: 8px 16px; } </style> </head> <body> <h1>Phi-3 Mini 128K Instruct</h1> <textarea id="prompt" placeholder="输入您的问题或指令..."></textarea> <button onclick="generateText()">生成回答</button> <div id="response"></div> <script> async function generateText() { const prompt = document.getElementById('prompt').value; const responseDiv = document.getElementById('response'); responseDiv.textContent = "正在生成回答..."; try { const response = await fetch('http://localhost:5000/generate', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ prompt: prompt }) }); const data = await response.json(); responseDiv.textContent = data.choices[0].text; } catch (error) { responseDiv.textContent = "请求出错: " + error.message; } } </script> </body> </html>

4.2 测试完整流程

确保vLLM服务运行在端口8000
确保FastAPI服务运行在端口5000
在浏览器中打开index.html文件
输入问题并点击"生成回答"按钮
观察模型返回的结果

5. 进阶配置与优化

5.1 添加身份验证

为了保护API，我们可以添加简单的API密钥验证。修改api_server.py：

from fastapi import FastAPI, HTTPException, Header from pydantic import BaseModel import requests app = FastAPI() API_KEY = "your-secret-key" # 替换为实际密钥 class PromptRequest(BaseModel): prompt: str max_tokens: int = 512 temperature: float = 0.7 @app.post("/generate") async def generate_text( request: PromptRequest, authorization: str = Header(None) ): if authorization != f"Bearer {API_KEY}": raise HTTPException(status_code=403, detail="Invalid API key") vllm_url = "http://localhost:8000/v1/completions" headers = {"Content-Type": "application/json"} payload = { "prompt": request.prompt, "max_tokens": request.max_tokens, "temperature": request.temperature } response = requests.post(vllm_url, json=payload, headers=headers) return response.json()

5.2 前端添加API密钥

修改前端JavaScript代码：

async function generateText() { const prompt = document.getElementById('prompt').value; const responseDiv = document.getElementById('response'); responseDiv.textContent = "正在生成回答..."; try { const response = await fetch('http://localhost:5000/generate', { method: 'POST', headers: { 'Content-Type': 'application/json', 'Authorization': 'Bearer your-secret-key' }, body: JSON.stringify({ prompt: prompt }) }); const data = await response.json(); responseDiv.textContent = data.choices[0].text; } catch (error) { responseDiv.textContent = "请求出错: " + error.message; } }