当前位置：首页 > news >正文

DeepSeek-R1-Distill-Qwen-1.5B数学符号识别：手写公式转LaTeX

news 2026/7/11 6:14:27

DeepSeek-R1-Distill-Qwen-1.5B数学符号识别：手写公式转LaTeX

1. 引言

1.1 业务场景描述

在科研、教育和工程领域，数学公式的数字化录入是一项高频且繁琐的任务。传统方式依赖手动输入 LaTeX 代码，对非专业用户门槛较高。随着深度学习技术的发展，将手写数学公式自动转换为结构化 LaTeX 表达式成为可能。本文介绍基于DeepSeek-R1-Distill-Qwen-1.5B模型构建的 Web 服务，实现高精度的手写公式识别与 LaTeX 转换。

该系统由开发者 by113 小贝进行二次开发，结合强化学习蒸馏技术优化推理能力，在保持轻量级参数规模的同时显著提升数学表达理解能力。

1.2 痛点分析

现有公式识别工具普遍存在以下问题：

对复杂嵌套结构（如多层积分、矩阵）识别准确率低
需要安装本地软件或插件，部署不便
响应延迟高，交互体验差
不支持端到端训练与微调

而大语言模型在逻辑推理和序列生成方面的进步，为解决上述问题提供了新路径。

1.3 方案预告

本文将详细介绍如何利用DeepSeek-R1-Distill-Qwen-1.5B构建一个可运行于 GPU 的 Web 服务，实现从图像输入到 LaTeX 输出的全流程自动化。内容涵盖环境配置、服务启动、Docker 部署及性能调优等关键环节。

2. 技术方案选型

2.1 模型选择依据

模型	参数量	数学推理能力	推理速度	易部署性
DeepSeek-R1-Distill-Qwen-1.5B	1.5B	⭐⭐⭐⭐☆	⭐⭐⭐⭐	⭐⭐⭐⭐
MathBERT	110M	⭐⭐⭐	⭐⭐⭐	⭐⭐⭐
T5-based Seq2Seq	770M	⭐⭐⭐☆	⭐⭐⭐	⭐⭐
LLaMA-3-8B-Instruct	8B	⭐⭐⭐⭐⭐	⭐⭐	⭐⭐

综合考虑模型大小、推理效率与数学语义理解能力，DeepSeek-R1-Distill-Qwen-1.5B在轻量化与高性能之间实现了良好平衡。

2.2 核心优势

蒸馏增强：基于 DeepSeek-R1 的强化学习数据蒸馏策略，显著提升数学推理泛化能力
多任务兼容：除公式识别外，还支持代码生成、逻辑推导等任务
低资源占用：1.5B 参数可在消费级 GPU 上高效运行（如 RTX 3090/4090）
开放许可：MIT 许可证允许商业使用与二次开发

3. 实现步骤详解

3.1 环境准备

确保系统满足以下要求：

# Python 版本检查 python --version # 需 >= 3.11 # CUDA 版本验证 nvidia-smi # 需支持 CUDA 12.8

安装必要依赖包：

pip install torch>=2.9.1 \ transformers>=4.57.3 \ gradio>=6.2.0 \ pillow opencv-python

注意：建议使用conda或venv创建独立虚拟环境以避免依赖冲突。

3.2 模型加载与缓存

模型已预下载并缓存至 Hugging Face 目录：

from transformers import AutoTokenizer, AutoModelForCausalLM model_path = "/root/.cache/huggingface/deepseek-ai/DeepSeek-R1-Distill-Qwen-1___5B" tokenizer = AutoTokenizer.from_pretrained(model_path, local_files_only=True) model = AutoModelForCausalLM.from_pretrained( model_path, device_map="auto", torch_dtype="auto" )

若需从远程拉取模型，请执行：

huggingface-cli download deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

3.3 图像预处理模块

手写公式图像需先进行标准化处理：

import cv2 from PIL import Image def preprocess_image(image: Image.Image) -> str: # 转为灰度图 img = image.convert('L') img_array = np.array(img) # 二值化处理 _, binary = cv2.threshold(img_array, 128, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU) # 提取轮廓并排序（从左到右） contours, _ = cv2.findContours(binary, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) bounding_boxes = [cv2.boundingRect(c) for c in contours] sorted_boxes = sorted(bounding_boxes, key=lambda x: x[0]) # x坐标排序 # 构造视觉顺序 token 序列（简化版） tokens = [] for x, y, w, h in sorted_boxes: roi = binary[y:y+h, x:x+w] if w > 10 and h > 10: # 过滤噪声 if h > w * 2: tokens.append('\\frac{}{}') # 假设为分数 elif w > h * 2: tokens.append('-') # 假设为减号 else: tokens.append('?') # 待模型识别 return ' '.join(tokens)

此模块将图像转化为初步符号序列，作为模型输入提示的一部分。

3.4 Gradio Web 接口实现

app.py主程序如下：

import gradio as gr import torch from transformers import AutoTokenizer, AutoModelForCausalLM # 加载模型 MODEL_PATH = "/root/.cache/huggingface/deepseek-ai/DeepSeek-R1-Distill-Qwen-1___5B" DEVICE = "cuda" if torch.cuda.is_available() else "cpu" tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, local_files_only=True) model = AutoModelForCausalLM.from_pretrained( MODEL_PATH, device_map="auto", torch_dtype=torch.float16 ).eval() # 推理函数 def recognize_formula(image): prompt = f""" 你是一个数学公式识别专家。请将以下手写公式的视觉特征转换为标准 LaTeX 表达式。 输入是经过预处理的符号序列，请结合上下文推断最可能的表达形式。 符号序列: {preprocess_image(image)} 请仅输出 LaTeX 代码，不要包含解释。 """ inputs = tokenizer(prompt, return_tensors="pt").to(DEVICE) with torch.no_grad(): outputs = model.generate( **inputs, max_new_tokens=256, temperature=0.6, top_p=0.95, do_sample=True ) raw_output = tokenizer.decode(outputs[0], skip_special_tokens=True) latex_code = extract_latex(raw_output) # 提取 $$...$$ 中的内容 return f"$$ {latex_code.strip()} $$" # 简单提取 LaTeX 内容 def extract_latex(text): start = text.find("$$") end = text.rfind("$$") if start != -1 and end != -1 and start != end: return text[start+2:end].strip() return text.replace("$$", "").strip() # 构建界面 demo = gr.Interface( fn=recognize_formula, inputs=gr.Image(type="pil"), outputs=gr.Markdown(label="LaTeX 输出"), title="Handwritten Formula to LaTeX Converter", description="Upload a handwritten math formula image to get its LaTeX representation.", examples=[ ["examples/integral.jpg"], ["examples/matrix.png"] ] ) if __name__ == "__main__": demo.launch(server_name="0.0.0.0", port=7860)

4. 实践问题与优化

4.1 常见问题及解决方案

问题	原因	解决方法
模型加载失败	缓存路径错误	检查`/root/.cache/huggingface`权限与完整性
GPU 内存不足	批次过大或显存被占用	设置`torch_dtype=torch.float16`并降低`max_new_tokens`
启动端口被占用	7860 已被其他进程使用	使用`lsof -i:7860`查杀占用进程
识别结果混乱	输入图像质量差	增加图像去噪、对比度增强预处理

4.2 性能优化建议

启用半精度推理：python torch_dtype=torch.float16 # 减少显存占用约 40%
限制最大输出长度：python max_new_tokens=256 # 防止无限生成
使用 KV Cache 优化：开启use_cache=True可加速自回归生成过程。
批处理请求（进阶）：使用vLLM或TGI替代原生 Hugging Face 推理，提升吞吐量。

5. Docker 部署方案

5.1 Dockerfile 解析

FROM nvidia/cuda:12.1.0-runtime-ubuntu22.04 RUN apt-get update && apt-get install -y \ python3.11 \ python3-pip \ && rm -rf /var/lib/apt/lists/* WORKDIR /app COPY app.py . COPY -r /root/.cache/huggingface /root/.cache/huggingface RUN pip3 install torch transformers gradio EXPOSE 7860 CMD ["python3", "app.py"]

基础镜像支持 CUDA 12.1，兼容主流 NVIDIA 驱动
模型缓存通过卷挂载方式共享主机数据，避免重复下载
使用CMD而非ENTRYPOINT便于覆盖命令调试

5.2 容器化部署命令

# 构建镜像 docker build -t deepseek-r1-1.5b:latest . # 运行容器 docker run -d --gpus all -p 7860:7860 \ -v /root/.cache/huggingface:/root/.cache/huggingface \ --name deepseek-web deepseek-r1-1.5b:latest