当前位置：首页 > news >正文

FastAPI+React+Docker构建可上线ML Web App实战指南

news 2026/7/30 15:11:17

1. 这不是“又一个Flask教程”，而是一份能上线、能扛压、能迭代的ML Web App实战手记

你有没有试过：在Jupyter里调通了一个准确率92%的模型，兴冲冲想分享给业务同事看效果，结果卡在“怎么让别人不用装Python也能点开就用”这一步？我干过三次——第一次用Streamlit本地跑，同事说“打不开”；第二次搭了个简易Flask，发个链接过去，对方刷新三遍才加载出按钮；第三次上了Docker，结果被运维一句“这个镜像没签名，不能进生产网段”挡在门外。直到第四次，我彻底扔掉“教学Demo思维”，从第一天起就按真实交付标准来：模型要可重训、接口要可监控、前端要能离线加载、部署包要能一键审计。这篇就是那套跑通了6个业务线、支撑日均3000+次预测请求的完整链路。核心关键词是Machine Learning Web App、Build and Deploy、Practical Guide——注意，是“Practical”，不是“Theoretical”。它不讲梯度下降推导，但会告诉你为什么requirements.txt里scikit-learn==1.3.0必须锁死小版本；不画神经网络结构图，但会拆解Nginx配置里proxy_buffering off这一行怎么救回90%的超时错误；不罗列10种框架对比，但会实测Flask、FastAPI、Gradio在并发100QPS下的内存泄漏曲线。适合三类人：刚跑通第一个模型想落地的算法同学、被临时拉去“把模型弄成网页”的后端工程师、以及需要快速验证MVP而不愿写一行前端的业务负责人。它解决的不是“能不能跑”，而是“敢不敢放出去用”。

2. 整体架构设计：为什么放弃“Jupyter+Streamlit”组合，选择“FastAPI+React+Docker”铁三角

2.1 从三个失败案例反推架构选型逻辑

第一次失败（Streamlit本地模式）暴露的是交付场景错配。Streamlit本质是交互式分析工具，它的st.button()背后是Python进程同步阻塞，当用户A点击预测、用户B同时点击时，B必须等A的模型推理完成才能拿到线程。我们当时测试了50并发，平均响应时间从800ms飙升到4.2s，且CPU占用率持续100%。这不是性能问题，是范式问题——Streamlit设计初衷就不是服务化。

第二次失败（纯Flask+Jinja2）暴露的是前后端耦合陷阱。我把模型预测逻辑和HTML渲染写在同一app.py里，结果业务方提了个小需求：“把预测结果表格加个导出Excel按钮”。我花了3小时改模板、加路由、引入pandas，结果发现导出功能触发了模型重载——因为Flask默认每个请求都重新import模块。更糟的是，当模型文件超过200MB时，每次请求都要加载一次，首屏时间稳定在12秒以上。这说明：把机器学习的重量级IO操作和Web服务的轻量级HTTP处理混在一起，是自找麻烦。

第三次失败（Docker+Flask+Gunicorn）暴露的是可观测性缺失。容器跑起来了，docker ps显示健康，但业务方反馈“有时候点不动”。查日志只看到504 Gateway Timeout，却找不到是模型卡死、还是数据库连接池耗尽、还是Nginx缓冲区溢出。没有指标埋点、没有请求链路追踪、没有模型预测耗时直方图，等于在黑盒里修电路。

所以第四次，我直接跳过所有“看起来很美”的方案，锚定三个硬性指标：

可隔离性：模型推理进程必须与Web服务进程物理隔离，避免一个请求拖垮整个服务；
可观测性：每个预测请求必须携带trace_id，能关联到模型耗时、特征预处理耗时、序列化耗时；
可审计性：部署包必须能脱离开发环境独立验证，即“给运维一份tar包，他解压就能跑，且能确认里面没藏恶意代码”。

2.2 FastAPI作为后端核心的不可替代性

选FastAPI不是跟风，是它解决了上述三个指标中的两个关键痛点。先看隔离性：FastAPI原生支持BackgroundTasks，但更重要的是它对async/await的深度整合。我们的模型推理本身是CPU密集型（无法异步），但特征获取、结果存储、日志上报这些IO操作完全可以异步化。我实测过：在predict()函数里，把redis.set()和prometheus_client.Counter().inc()都改成await调用，QPS从18提升到32，且P99延迟下降47%。这不是魔法，是事件循环把IO等待时间腾出来给了更多CPU任务。

再看可观测性：FastAPI的OpenAPI文档是自动生成的，但真正救命的是它的middleware机制。我写了这样一个中间件：

from starlette.middleware.base import BaseHTTPMiddleware from starlette.requests import Request from starlette.responses import Response import time import uuid class MetricsMiddleware(BaseHTTPMiddleware): async def dispatch(self, request: Request, call_next): request_id = str(uuid.uuid4()) start_time = time.time() # 注入trace_id到request.state，供后续handler使用 request.state.trace_id = request_id response = await call_next(request) process_time = time.time() - start_time # 上报到Prometheus REQUEST_LATENCY_SECONDS.observe( process_time, method=request.method, endpoint=request.url.path, http_status=response.status_code ) response.headers["X-Request-ID"] = request_id return response

这段代码让每个请求自动带上唯一ID、自动统计耗时、自动注入响应头。运维用curl -I http://api/predict就能看到X-Request-ID，再结合ELK日志，5分钟内定位到是哪个模型版本在特定特征下出现长尾延迟。而Flask要实现同样效果，得自己写装饰器、手动传参、还要处理异常分支，代码量多3倍且易出错。

提示：别迷信“异步=快”。如果你的模型是joblib.load('model.pkl')加载的sklearn模型，它内部全是同步C代码，强行套async def毫无意义。FastAPI的价值在于：它让你能把真正能异步的部分（如DB查询、缓存读写）高效利用起来，而不是逼你把CPU任务也改成异步。

2.3 React前端为何比纯HTML+JS更“实用”

有人问：“预测页面就一个输入框、一个按钮、一个结果框，为什么要上React？”答案是：状态管理的确定性。举个真实例子：业务方要求“当用户输入手机号时，自动补全归属地和运营商”。如果用jQuery写，代码可能是：

$('#phone').on('input', async function() { const res = await fetch('/api/lookup?phone=' + this.value); $('#carrier').val((await res.json()).carrier); });

看似简单，但当用户快速连输“13812345678”时，会发出8次请求（13、138、1381…），而第3次请求（1381）可能比第8次（13812345678）后返回，导致最终显示错误的运营商。这就是竞态请求（race condition）。React配合useEffect和AbortController能优雅解决：

useEffect(() => { if (phone.length < 11) return; const controller = new AbortController(); const fetchCarrier = async () => { try { const res = await fetch(`/api/lookup?phone=${phone}`, { signal: controller.signal }); const data = await res.json(); setCarrier(data.carrier); } catch (e) { if (e.name !== 'AbortError') console.error(e); } }; fetchCarrier(); return () => controller.abort(); // 取消上一次未完成的请求 }, [phone]);

这段代码保证：无论用户输多快，永远只有最后一次输入触发的请求生效。这种确定性，在业务逻辑变复杂（比如要联动多个下拉框、要校验输入格式、要支持离线缓存）时，是纯DOM操作无法提供的。而且React的Vite构建工具链，能自动把model.onnx文件转成WebAssembly模块，在浏览器里直接运行轻量模型，彻底绕过服务端——这是我们后来给销售团队做的离线演示版的核心能力。

2.4 Docker镜像分层设计：为什么基础镜像选`python:3.11-slim`而非`alpine`

这里有个血泪教训：早期我们用python:3.11-alpine，镜像体积只有120MB，看着很美。但当集成lightgbm时，编译报错musl libc not compatible with glibc。Alpine用的是musl libc，而大多数Python科学计算包（尤其是带C扩展的）都是为glibc编译的。强行apk add gcompat又引发新依赖冲突。最后换回python:3.11-slim（基于Debian），体积涨到350MB，但pip install lightgbm一行通过。

我们的Dockerfile采用四层设计：

# 第一层：基础环境（不变） FROM python:3.11-slim WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt # 第二层：模型资产（低频更新） COPY models/ ./models/ # 此处不RUN任何命令，避免污染镜像层 # 第三层：应用代码（高频更新） COPY app/ ./app/ COPY pyproject.toml . # 第四层：启动配置（每次构建都变） CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0:8000", "--reload"]

这样设计的好处是：当只改前端JS时，Docker build只重建第四层，秒级完成；当更新模型文件（如换了个新版本pkl），只重建第二、三层，缓存第一层的pip安装；只有升级Python版本时，才重做全部。我们CI流水线实测，90%的构建耗时控制在45秒内，而旧版单层Dockerfile平均要6分23秒。

注意：--reload只用于开发环境！生产镜像必须删掉，改用--workers 4并配合gunicorn。Uvicorn的--reload会监听文件变化并重启进程，但在容器里可能导致PID 1进程意外退出，触发Kubernetes反复拉起Pod。

3. 核心细节解析：从模型封装到API设计的12个致命细节

3.1 模型不是“load完就能用”，封装必须解决四个现实问题

很多教程教joblib.load('model.pkl')就完事，但真实场景中，模型加载只是万里长征第一步。我们必须面对：

问题1：模型文件路径硬编码
错误写法：model = joblib.load('./models/rf_v2.pkl')
风险：本地路径在容器里不存在；不同环境（dev/staging/prod）模型路径不同。
正确做法：用环境变量驱动路径，且提供fallback机制：

import os from pathlib import Path MODEL_DIR = Path(os.getenv("MODEL_DIR", "/app/models")) MODEL_PATH = MODEL_DIR / os.getenv("MODEL_FILENAME", "rf_v2.pkl") if not MODEL_PATH.exists(): raise RuntimeError(f"Model file not found at {MODEL_PATH}") model = joblib.load(MODEL_PATH)

问题2：特征工程与训练时不一致
错误写法：在predict()函数里现场写df['age_group'] = df['age'].apply(lambda x: 'young' if x<30 else ...)
风险：训练时用Pandascut()分箱，预测时用if-else，边界值处理不一致，线上准确率暴跌。
正确做法：把整个特征工程封装成可序列化的Transformer类，并和模型一起保存：

from sklearn.preprocessing import StandardScaler from sklearn.pipeline import Pipeline # 训练时 preprocessor = ColumnTransformer( transformers=[ ('num', StandardScaler(), ['age', 'income']), ('cat', OneHotEncoder(), ['gender', 'city']) ], remainder='passthrough' ) pipeline = Pipeline([('prep', preprocessor), ('model', RandomForestClassifier())]) joblib.dump(pipeline, 'full_pipeline.pkl') # 预测时 pipeline = joblib.load('full_pipeline.pkl') result = pipeline.predict([[25, 5000, 'M', 'Beijing']])

问题3：模型热更新导致内存泄漏
错误写法：每次HTTP请求都joblib.load()一次模型。
风险：模型对象（尤其XGBoost）包含大量C指针，Python GC无法及时回收，内存占用随请求数线性增长。
正确做法：全局单例+文件监控。我们用watchdog库监听模型目录：

from watchdog.observers import Observer from watchdog.events import FileSystemEventHandler class ModelReloader(FileSystemEventHandler): def __init__(self, model_loader): self.model_loader = model_loader def on_modified(self, event): if event.src_path.endswith('.pkl'): print(f"Reloading model from {event.src_path}") self.model_loader.reload() # 启动时注册监听 observer = Observer() observer.schedule(ModelReloader(load_model), path='/app/models', recursive=False) observer.start()

问题4：预测结果不是数字，而是业务可理解的语义
错误写法：return {"prediction": 1, "probability": 0.83}
风险：业务方不知道1代表“高风险”还是“低风险”。
正确做法：在模型保存时，一并保存label encoder映射表：

# 训练时 le = LabelEncoder() y_encoded = le.fit_transform(y_train) joblib.dump(le, 'label_encoder.pkl') # 预测时 y_pred = model.predict(X_test) y_pred_label = le.inverse_transform(y_pred) # 得到['high_risk', 'low_risk']

3.2 API设计：为什么拒绝RESTful风格，坚持GraphQL式单端点

我们最初按RESTful设计了/api/v1/users/{id}/risk、/api/v1/transactions/{id}/fraud等多个端点，结果两周后业务方提出：“能不能把用户风险和最近三笔交易欺诈概率合并返回？”——这意味着要新增一个端点，或改造现有两个。而前端为了展示一个卡片，得发两次请求，还面临竞态问题。

于是我们砍掉所有REST端点，只留一个POST /api/predict，请求体是GraphQL式结构：

{ "model": "user_risk_v3", "inputs": { "user_id": "U123456", "features": { "age": 28, "income": 12000, "last_login_days_ago": 2 } }, "include_explanation": true }

响应体结构统一：

{ "status": "success", "trace_id": "a1b2c3d4", "result": { "label": "high_risk", "score": 0.92, "explanation": { "top_features": [ {"name": "last_login_days_ago", "contribution": 0.41}, {"name": "income", "contribution": -0.23} ] } } }

好处有三：

前端自由组合：要用户风险，就传model=user_risk；要交易欺诈，就传model=transaction_fraud；要两者，前端自己发两次请求，逻辑清晰；
后端无感扩展：新增模型只需在model_registry.py里注册，无需改路由、不改API协议；
调试极度友好：运维用curl直接发JSON，不用记一堆URL路径，Postman收藏夹里只有一个请求。

实操心得：include_explanation参数默认false。SHAP解释计算开销大，线上只对1%的请求采样开启。我们用Redis计数器实现：“每100次请求，第100次自动设为true”，既满足审计需求，又不拖慢主流程。

3.3 输入验证：为什么宁可多写50行Pydantic代码，也不信前端传来的JSON

前端永远不可信。我们吃过亏：某次前端把"age": "25"（字符串）传过来，模型predict()直接抛ValueError: could not convert string to float，整个服务500错误。后来加了Pydantic模型：

from pydantic import BaseModel, Field, validator from typing import Optional, List class PredictionRequest(BaseModel): model: str = Field(..., min_length=3, max_length=50, regex=r'^[a-z0-9_]+$') inputs: dict include_explanation: bool = False @validator('inputs') def validate_inputs(cls, v): if not isinstance(v, dict): raise ValueError('inputs must be a dict') if 'user_id' not in v: raise ValueError('user_id is required in inputs') if not isinstance(v.get('age'), (int, float)): raise ValueError('age must be number') if v.get('age') < 0 or v.get('age') > 120: raise ValueError('age must be between 0 and 120') return v

这个验证器做了四件事：

用正则确保model名只含小写字母、数字、下划线，防止路径遍历（如model=../../etc/passwd）；
强制user_id存在，避免空值穿透到模型层；
类型强转：v.get('age')如果是字符串"25"，isinstance返回False，触发ValueError，FastAPI自动返回422错误；
范围校验：年龄0-120，比前端JS校验更可靠（JS可被禁用或绕过）。

最关键的是，所有验证失败都返回结构化错误：

{ "detail": [ { "loc": ["body", "inputs", "age"], "msg": "age must be number", "type": "value_error" } ] }

前端不用解析错误文本，直接取error.detail[0].loc就知道是哪个字段错了，能准确定位到表单控件。

3.4 日志与监控：如何用100行代码实现生产级可观测性

没有监控的ML服务就像没有仪表盘的飞机。我们用最简方案实现三大能力：指标采集、日志聚合、链路追踪。

指标采集（Prometheus）：
定义三个核心指标：

from prometheus_client import Counter, Histogram, Gauge # 请求总量 REQUEST_COUNT = Counter( 'ml_app_requests_total', 'Total HTTP Requests', ['method', 'endpoint', 'http_status'] ) # 延迟直方图（单位：秒） REQUEST_LATENCY_SECONDS = Histogram( 'ml_app_request_latency_seconds', 'HTTP Request Latency', ['method', 'endpoint'], buckets=[0.01, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0] ) # 模型加载状态（1=已加载，0=未加载） MODEL_LOADED = Gauge( 'ml_app_model_loaded', 'Model Load Status', ['model_name'] )

在预测函数里埋点：

@app.post("/api/predict") async def predict(request: PredictionRequest): start_time = time.time() try: # 模型预测... result = model.predict(...) # 成功指标 REQUEST_COUNT.labels( method="POST", endpoint="/api/predict", http_status=200 ).inc() REQUEST_LATENCY_SECONDS.labels( method="POST", endpoint="/api/predict" ).observe(time.time() - start_time) return {"result": result} except Exception as e: # 错误指标 REQUEST_COUNT.labels( method="POST", endpoint="/api/predict", http_status=500 ).inc() raise e

日志聚合（结构化JSON）：
不用print()，用structlog输出JSON：

import structlog logger = structlog.get_logger() logger.info( "prediction_started", trace_id=request.state.trace_id, model=request.model, user_id=request.inputs.get("user_id"), features_count=len(request.inputs.get("features", {})) )

输出长这样：

{"event": "prediction_started", "trace_id": "a1b2c3d4", "model": "user_risk_v3", "user_id": "U123456", "features_count": 3, "timestamp": "2024-05-20T14:23:11.123Z"}

ELK直接索引trace_id字段，就能串起一次请求的所有日志。

链路追踪（OpenTelemetry）：
用opentelemetry-instrument自动注入trace：

opentelemetry-instrument \ --traces-exporter console \ --metrics-exporter console \ uvicorn app.main:app --host 0.0.0.0:8000

前端在请求头加traceparent: 00-a1b2c3d4...，后端自动关联所有子Span（DB查询、缓存读取、模型预测）。我们不用Jaeger，因为console导出器配合grep trace_id已足够日常排查。

注意：OpenTelemetry的自动instrumentation对joblib.load无效（它不是标准库函数），所以我们手动加Span：

from opentelemetry import trace tracer = trace.get_tracer(__name__) with tracer.start_as_current_span("model_predict") as span: span.set_attribute("model.name", request.model) result = model.predict(X) span.set_attribute("prediction.score", float(result[0]))

4. 实操过程：从零开始构建可交付的ML Web App全流程

4.1 环境准备：用Poetry管理依赖，告别`requirements.txt`地狱

pip freeze > requirements.txt是新手坟墓。它会把wheel、setuptools甚至pip本身都写进去，且版本不锁定小号（numpy变成numpy==1.24.3而非numpy>=1.24.0,<1.25.0）。我们用Poetry：

# 初始化 poetry init # 交互式添加依赖（会自动写pyproject.toml） poetry add fastapi uvicorn pydantic scikit-learn pandas poetry add --group dev pytest black mypy # 生成锁定文件（类似npm shrinkwrap） poetry lock # 导出生产环境依赖（不含dev组） poetry export -f requirements.txt --without-hashes --without-dev > requirements.txt

pyproject.toml关键片段：

[tool.poetry.dependencies] python = "^3.11" fastapi = "^0.110.0" scikit-learn = { version = "^1.3.0", python = "^3.11" } # 注意：指定python版本，避免Poetry在3.10环境下装3.11专属包 [tool.poetry.group.dev.dependencies] pytest = "^7.4.0" black = "^23.10.0" [build-system] requires = ["poetry-core"] build-backend = "poetry.core.masonry.api"

这样做的好处：

poetry install在任何机器上都装完全相同的依赖树；
poetry show --tree能可视化依赖冲突（比如xgboost和lightgbm都依赖不同版本的numpy）；
CI流水线用poetry export生成的requirements.txt，比手写的准确10倍。

4.2 模型服务化：用ONNX Runtime加速，把1.2秒预测压到120毫秒

我们的原始sklearn模型在CPU上预测耗时1.2秒，无法满足业务方“亚秒级响应”要求。优化路径是：

用skl2onnx转换模型；
用onnxruntime加载，启用ExecutionProvider；
对输入数据做内存布局优化。

步骤1：模型转换

from skl2onnx import convert_sklearn from skl2onnx.common.data_types import FloatTensorType # 定义输入类型（必须！否则ONNX Runtime报错） initial_type = [('float_input', FloatTensorType([None, 10]))] # 10个特征 onx = convert_sklearn(model, initial_types=initial_type) with open("model.onnx", "wb") as f: f.write(onx.SerializeToString())

步骤2：ONNX Runtime加载与优化

import onnxruntime as ort # 启用CUDA（如果有GPU） providers = ['CUDAExecutionProvider', 'CPUExecutionProvider'] sess = ort.InferenceSession("model.onnx", providers=providers) # 关键：预热（warm up） dummy_input = np.random.rand(1, 10).astype(np.float32) sess.run(None, {'float_input': dummy_input}) # 预测 def predict_onnx(X: np.ndarray) -> np.ndarray: X = X.astype(np.float32) # ONNX要求float32 result = sess.run(None, {'float_input': X}) return result[0] # 返回第一个输出

步骤3：内存布局优化
原始Pandas DataFrame转NumPy时，df.values是C-order（行优先），但ONNX Runtime默认期望F-order（列优先）。我们加一行：

X = df[feature_cols].values.astype(np.float32) X = np.ascontiguousarray(X, dtype=np.float32) # 强制C-order

实测结果：

CPU模式：1.2s → 0.18s（提升6.7倍）；
CUDA模式（T4 GPU）：1.2s → 0.032s（提升37.5倍）；
内存占用下降40%，因为ONNX Runtime比sklearn更省内存。

实操心得：ONNX转换不是万能的。我们试过XGBoost模型，转换后精度损失0.3%，原因是xgboost的predict_proba在ONNX里实现不一致。解决方案是：只转换predict()，predict_proba()仍走原生XGBoost，用if-else判断——线上服务可以接受“部分路径非最优”，但不能接受“结果不准”。

4.3 前端构建：Vite+TypeScript+Tailwind，5分钟搭出专业UI

我们不用Create React App，因为它的打包配置太重。Vite的冷启动速度是CRACO的10倍：

npm create vite@latest ml-web-app -- --template react-ts cd ml-web-app npm install npm install -D tailwindcss postcss autoprefixer npx tailwindcss init -p

tailwind.config.js精简配置：

module.exports = { content: ["./index.html", "./src/**/*.{js,jsx,ts,tsx}"], theme: { extend: { colors: { primary: '#3b82f6', // blue-500 secondary: '#6b7280', // gray-500 } } }, plugins: [], }

核心组件PredictForm.tsx：

import { useState, useEffect } from 'react'; export default function PredictForm() { const [formData, setFormData] = useState({ user_id: '', age: '', income: '', }); const [result, setResult] = useState<any>(null); const [loading, setLoading] = useState(false); const handleSubmit = async (e: React.FormEvent) => { e.preventDefault(); setLoading(true); try { const res = await fetch('/api/predict', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ model: 'user_risk_v3', inputs: { user_id: formData.user_id, features: { age: Number(formData.age), income: Number(formData.income), } } }) }); const data = await res.json(); setResult(data); } catch (e) { console.error(e); alert('请求失败，请检查网络'); } finally { setLoading(false); } }; return ( <div className="max-w-2xl mx-auto p-4"> <h1 className="text-2xl font-bold text-gray-800 mb-6">用户风险预测</h1> <form onSubmit={handleSubmit} className="space-y-4"> <div> <label className="block text-sm font-medium text-gray-700 mb-1"> 用户ID </label> <input type="text" value={formData.user_id} onChange={(e) => setFormData({...formData, user_id: e.target.value})} className="w-full px-3 py-2 border border-gray-300 rounded-md focus:outline-none focus:ring-2 focus:ring-blue-500" required /> </div> <div className="grid grid-cols-2 gap-4"> <div> <label className="block text-sm font-medium text-gray-700 mb-1"> 年龄 </label> <input type="number" value={formData.age} onChange={(e) => setFormData({...formData, age: e.target.value})} className="w-full px-3 py-2 border border-gray-300 rounded-md focus:outline-none focus:ring-2 focus:ring-blue-500" required /> </div> <div> <label className="block text-sm font-medium text-gray-700 mb-1"> 月收入（元） </label> <input type="number" value={formData.income} onChange={(e) => setFormData({...formData, income: e.target.value})} className="w-full px-3 py-2 border border-gray-300 rounded-md focus:outline-none focus:ring-2 focus:ring-blue-500" required /> </div> </div> <button type="submit" disabled={loading} className={`w-full py-2 px-4 rounded-md text-white font-medium ${ loading ? 'bg-blue-400 cursor-not-allowed' : 'bg-blue-600 hover:bg-blue-700' }`} > {loading ? '预测中...' : '开始预测'} </button> </form> {result && ( <div className="mt-8 p-4 bg-green-50 border border-green-200 rounded-md"> <h2 className="text-lg font-semibold text-green-800 mb-2">预测结果</h2> <p className="text-green-700"> 用户 <strong>{result.result?.label}</strong>（置信度 {Math.round((result.result?.score || 0) * 100)}%） </p> </div> )} </div> ); }

这个组件做到了：

表单实时校验（required属性）；
提交时禁用按钮防重复点击；
加载状态视觉反馈；
结果用语义化颜色（绿色成功、红色错误）；
响应式网格布局（手机上单列，桌面双列）。

构建命令一行搞定：

npm run build # 输出到dist/目录

4.4 Docker部署：Nginx反向代理+Uvicorn多进程，抗住100QPS

生产Dockerfile（Dockerfile.prod）：

FROM python:3.11-slim # 安装系统依赖 RUN apt-get update && apt-get install -y \ nginx \ && rm -rf /var/lib/apt/lists/* # 复制Python依赖 WORKDIR /app COPY poetry.lock pyproject.toml ./ RUN pip install poetry && poetry install --no-dev --no-interaction # 复制应用代码和模型 COPY app/ ./app/ COPY models/ ./models/ # 复制Nginx配置 COPY nginx.conf /etc/nginx/nginx.conf # 暴露端口 EXPOSE 80 # 启动脚本 COPY entrypoint.sh /entrypoint.sh RUN chmod +x /entrypoint.sh ENTRYPOINT ["/entrypoint.sh"]

entrypoint.sh：

#!/bin/bash # 启动Uvicorn（后台） uvicorn app.main:app \ --host 127.0.0.1:8000 \ --workers 4 \ --limit-concurrency 100 \ --timeout-keep-alive 5 \ & # 启动Nginx（前台，作为PID 1） exec nginx -g "daemon off;"

nginx.conf关键配置：

events { worker_connections 1024; } http { upstream ml_backend { server 127.0.0.1:8000; # 健康检查（需nginx plus，开源版用简单轮询） } server { listen 80; server_name _; location / { proxy_pass http://ml_backend; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; # 关键：关闭缓冲，避免长连接阻塞 proxy_buffering off; proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection "upgrade"; # 超时调大 proxy_connect_timeout 30s; proxy_send_timeout 30s; proxy_read_timeout 30s; } # 静态文件（前端dist

查看全文

http://www.jsqmd.com/news/953729/