当前位置：首页 > news >正文

GLM-OCR API调用详解：Python示例，助你快速集成到项目

news 2026/6/23 0:26:48

GLM-OCR API调用详解：Python示例，助你快速集成到项目

1. GLM-OCR简介与核心能力

GLM-OCR是一款专业级多模态OCR模型，在权威文档解析基准测试OmniDocBench V1.5中以94.6分取得SOTA表现。它不仅能识别常规文本，还能处理复杂的文档结构，特别适合需要高精度识别的业务场景。

1.1 四大核心功能

文本识别：支持中英文混合识别，准确率高达98%
公式解析：可识别LaTeX格式的数学公式
表格还原：保持原始表格结构，支持合并单元格
信息抽取：从文档中提取关键字段（如发票金额、日期等）

1.2 技术优势

轻量级设计，单张消费级显卡即可运行
RESTful API接口，易于集成到现有系统
平均响应时间<500ms（A10G显卡）
支持PNG/JPG/JPEG/WEBP等多种图片格式

2. 环境准备与API服务配置

2.1 服务部署检查

在开始调用API前，请确保服务已正确启动：

# 检查服务状态 supervisorctl status # 预期输出示例 glm-ocr:glm-ocr-webui RUNNING pid 1234, uptime 0:05:23 glm-ocr:glm-ocr RUNNING pid 1235, uptime 0:05:23

如果服务未运行，执行以下命令启动：

supervisorctl restart glm-ocr:*

2.2 端口确认

GLM-OCR默认使用两个端口：

7860：Web界面（可选）
8080：API服务（必需）

确保防火墙已放行这些端口：

# Ubuntu示例 sudo ufw allow 8080/tcp

3. Python API调用实战

3.1 基础文本识别示例

以下是最简单的文本识别代码：

import requests import base64 def ocr_basic(image_path): # 读取图片并编码 with open(image_path, "rb") as f: image_data = base64.b64encode(f.read()).decode('utf-8') url = "http://localhost:8080/v1/chat/completions" payload = { "messages": [ { "role": "user", "content": [ {"type": "image", "url": f"data:image/png;base64,{image_data}"}, {"type": "text", "text": "Text Recognition:"} ] } ] } response = requests.post(url, json=payload) return response.json() # 使用示例 result = ocr_basic("invoice.png") print(result['choices'][0]['message']['content'])

3.2 表格识别进阶示例

对于表格类文档，可以指定识别模式：

def ocr_table(image_path): with open(image_path, "rb") as f: image_data = base64.b64encode(f.read()).decode('utf-8') url = "http://localhost:8080/v1/chat/completions" payload = { "messages": [ { "role": "user", "content": [ {"type": "image", "url": f"data:image/png;base64,{image_data}"}, {"type": "text", "text": "Table Recognition:"} # 关键指令 ] } ] } response = requests.post(url, json=payload) return response.json() # 返回的表格数据通常为Markdown格式 table_result = ocr_table("financial_report.png") print(table_result['choices'][0]['message']['content'])

3.3 批量处理与性能优化

当需要处理大量图片时，建议使用异步请求：

import aiohttp import asyncio async def async_ocr(image_paths): async with aiohttp.ClientSession() as session: tasks = [] for path in image_paths: with open(path, "rb") as f: image_data = base64.b64encode(f.read()).decode('utf-8') payload = { "messages": [ { "role": "user", "content": [ {"type": "image", "url": f"data:image/png;base64,{image_data}"}, {"type": "text", "text": "Text Recognition:"} ] } ] } tasks.append(session.post("http://localhost:8080/v1/chat/completions", json=payload)) responses = await asyncio.gather(*tasks) return [await r.json() for r in responses] # 使用示例 image_list = ["doc1.png", "doc2.png", "doc3.png"] results = asyncio.run(async_ocr(image_list))

4. 高级功能与实用技巧

4.1 混合内容识别

GLM-OCR可以同时处理包含文本、公式和表格的复杂文档：

def mixed_ocr(image_path): with open(image_path, "rb") as f: image_data = base64.b64encode(f.read()).decode('utf-8') url = "http://localhost:8080/v1/chat/completions" payload = { "messages": [ { "role": "user", "content": [ {"type": "image", "url": f"data:image/png;base64,{image_data}"}, {"type": "text", "text": "识别文档中的所有内容，包括文本、公式和表格"} ] } ], "temperature": 0.3 # 降低随机性，提高识别稳定性 } response = requests.post(url, json=payload) return response.json()

4.2 结构化数据提取

通过自然语言指令提取特定信息：

def extract_info(image_path, query): with open(image_path, "rb") as f: image_data = base64.b64encode(f.read()).decode('utf-8') url = "http://localhost:8080/v1/chat/completions" payload = { "messages": [ { "role": "user", "content": [ {"type": "image", "url": f"data:image/png;base64,{image_data}"}, {"type": "text", "text": query} ] } ] } response = requests.post(url, json=payload) return response.json() # 示例：提取发票信息 invoice_info = extract_info("invoice.jpg", "提取发票中的：开票日期、金额、销售方名称") print(invoice_info['choices'][0]['message']['content'])

4.3 图像预处理建议

为提高识别准确率，建议在调用API前进行简单预处理：

from PIL import Image import numpy as np def preprocess_image(image_path): img = Image.open(image_path) # 自动旋转校正（如有必要） if hasattr(img, '_getexif'): exif = img._getexif() if exif: orientation = exif.get(0x0112) if orientation == 3: img = img.rotate(180, expand=True) elif orientation == 6: img = img.rotate(270, expand=True) elif orientation == 8: img = img.rotate(90, expand=True) # 对比度增强 img = img.convert('L') img = np.array(img) img = (img - img.min()) * (255 / (img.max() - img.min())) img = Image.fromarray(img.astype('uint8')) return img # 使用预处理后的图片 processed_img = preprocess_image("low_contrast.jpg") processed_img.save("processed.jpg") result = ocr_basic("processed.jpg")

5. 错误处理与性能优化

5.1 常见错误代码处理

完善的API调用应包含错误处理逻辑：

def robust_ocr(image_path): try: with open(image_path, "rb") as f: image_data = base64.b64encode(f.read()).decode('utf-8') url = "http://localhost:8080/v1/chat/completions" payload = { "messages": [ { "role": "user", "content": [ {"type": "image", "url": f"data:image/png;base64,{image_data}"}, {"type": "text", "text": "Text Recognition:"} ] } ] } response = requests.post(url, json=payload, timeout=10) response.raise_for_status() result = response.json() if "choices" not in result: raise ValueError("Invalid response format") return result['choices'][0]['message']['content'] except requests.exceptions.RequestException as e: print(f"请求失败: {str(e)}") return None except Exception as e: print(f"处理失败: {str(e)}") return None

5.2 性能优化建议

连接池复用：为高频调用创建Session对象
超时设置：避免长时间等待
结果缓存：对相同图片缓存识别结果

优化后的示例：

from functools import lru_cache import hashlib session = requests.Session() def get_image_hash(image_path): with open(image_path, "rb") as f: return hashlib.md5(f.read()).hexdigest() @lru_cache(maxsize=100) def cached_ocr(image_hash, image_path): with open(image_path, "rb") as f: image_data = base64.b64encode(f.read()).decode('utf-8') url = "http://localhost:8080/v1/chat/completions" payload = { "messages": [ { "role": "user", "content": [ {"type": "image", "url": f"data:image/png;base64,{image_data}"}, {"type": "text", "text": "Text Recognition:"} ] } ] } try: response = session.post(url, json=payload, timeout=5) response.raise_for_status() return response.json() except requests.exceptions.RequestException: return None # 使用示例 image_hash = get_image_hash("contract.pdf") result = cached_ocr(image_hash, "contract.pdf")

6. 项目集成建议

6.1 微服务架构设计

建议将GLM-OCR封装为独立微服务：

项目架构示例： └── 您的业务系统 ├── Web前端 ├── 业务逻辑层 └── OCR服务网关 ← 调用 → GLM-OCR微服务(本机或内网)

6.2 Django集成示例

在Django项目中创建OCR服务模块：

# ocr_service.py import requests from django.conf import settings class OCRService: def __init__(self): self.api_url = settings.OCR_API_URL # 配置在settings.py中 def recognize_text(self, image_file): image_data = base64.b64encode(image_file.read()).decode('utf-8') payload = { "messages": [ { "role": "user", "content": [ {"type": "image", "url": f"data:image/png;base64,{image_data}"}, {"type": "text", "text": "Text Recognition:"} ] } ] } response = requests.post(self.api_url, json=payload) return response.json() # views.py中使用示例 from .ocr_service import OCRService def process_document(request): if request.method == 'POST': uploaded_file = request.FILES['document'] ocr = OCRService() result = ocr.recognize_text(uploaded_file) return JsonResponse(result) return HttpResponseBadRequest()

6.3 Flask集成示例

快速创建OCR API网关：

from flask import Flask, request, jsonify import requests app = Flask(__name__) OCR_API = "http://localhost:8080/v1/chat/completions" @app.route('/api/ocr', methods=['POST']) def ocr_proxy(): if 'file' not in request.files: return jsonify({"error": "No file uploaded"}), 400 file = request.files['file'] image_data = base64.b64encode(file.read()).decode('utf-8') payload = { "messages": [ { "role": "user", "content": [ {"type": "image", "url": f"data:image/png;base64,{image_data}"}, {"type": "text", "text": request.form.get('instruction', 'Text Recognition:')} ] } ] } response = requests.post(OCR_API, json=payload) return jsonify(response.json()) if __name__ == '__main__': app.run(port=5000)