当前位置：首页 > news >正文

GLM-OCR结合Ollama使用：另一种快速调用GLM-OCR模型的方法

news 2026/3/31 0:27:32

GLM-OCR结合Ollama使用：另一种快速调用GLM-OCR模型的方法

1. 项目概述

GLM-OCR是一个基于GLM-V编码器-解码器架构构建的多模态OCR模型，专为复杂文档理解而设计。它集成了在大规模图文数据上预训练的CogViT视觉编码器、轻量级跨模态连接器以及GLM-0.5B语言解码器，支持文本识别、表格识别和公式识别等多种功能。

传统上，GLM-OCR需要通过Gradio界面或Python API调用，而本文将介绍一种更便捷的方法——通过Ollama库来调用GLM-OCR模型。这种方法简化了部署流程，让开发者能够更快速地集成OCR功能到自己的应用中。

2. 环境准备

2.1 安装Ollama

首先需要安装Ollama库，这是一个用于与本地或远程Ollama服务交互的Python客户端：

pip install ollama

2.2 拉取GLM-OCR模型

使用Ollama命令行工具拉取GLM-OCR模型：

ollama pull glm-ocr:latest

这个过程会自动下载模型文件，下载完成后就可以通过API调用了。

2.3 项目依赖配置

创建一个pyproject.toml文件来管理项目依赖：

[project] name = "glm-ocr-ollama-inference" version = "0.1.0" description = "glm-ocr ollama api inference" readme = "README.md" requires-python = ">=3.13,<3.14" dependencies = [ "markupsafe==3.0.2", "ollama>=0.6.1", "torch", "torchaudio", "torchvision", ] [tool.uv.sources] torch = [ { index = "pytorch-cu128", marker = "sys_platform == 'linux' or sys_platform == 'win32'" }, ] torchvision = [ { index = "pytorch-cu128", marker = "sys_platform == 'linux' or sys_platform == 'win32'" }, ] [[tool.uv.index]] name = "pytorch-cu128" url = "https://download.pytorch.org/whl/cu128" explicit = true

3. 使用Ollama API调用GLM-OCR

3.1 基本调用方法

创建一个Python脚本（如main.py）来调用GLM-OCR模型：

import asyncio from ollama import Client async def main(): client = Client( host="http://localhost:11434", ) response = await asyncio.to_thread( client.chat, model="glm-ocr:latest", messages=[ { "role": "user", "content": """{ "text": "Text Recognition:", "formula": "Formula Recognition:", "table": "Table Recognition:" }""", "images": ["inputs/1.png"], } ], ) print(response) response_text = response["message"]["content"] response_lines = response_text.strip().split("\n") for line in response_lines: print(line) if __name__ == "__main__": asyncio.run(main())

3.2 功能说明

GLM-OCR通过Ollama API支持三种主要功能：

文本识别：使用"text": "Text Recognition:"作为提示词
表格识别：使用"table": "Table Recognition:"作为提示词
公式识别：使用"formula": "Formula Recognition:"作为提示词

3.3 参数说明

host: Ollama服务地址，默认为http://localhost:11434
model: 使用的模型名称，这里是glm-ocr:latest
messages: 包含用户输入和图片路径的列表
images: 要识别的图片路径数组

4. 实际应用示例

4.1 文本识别

对于普通文档图片，可以使用文本识别功能：

response = await asyncio.to_thread( client.chat, model="glm-ocr:latest", messages=[ { "role": "user", "content": """{ "text": "Text Recognition:" }""", "images": ["document.png"], } ], )

4.2 表格识别

对于包含表格的图片，可以使用表格识别功能：

response = await asyncio.to_thread( client.chat, model="glm-ocr:latest", messages=[ { "role": "user", "content": """{ "table": "Table Recognition:" }""", "images": ["table.png"], } ], )

4.3 公式识别

对于包含数学公式的图片，可以使用公式识别功能：

response = await asyncio.to_thread( client.chat, model="glm-ocr:latest", messages=[ { "role": "user", "content": """{ "formula": "Formula Recognition:" }""", "images": ["formula.png"], } ], )

5. 性能优化建议

5.1 批量处理

如果需要处理多张图片，可以考虑批量处理以提高效率：

responses = [] for image_path in image_paths: response = await asyncio.to_thread( client.chat, model="glm-ocr:latest", messages=[ { "role": "user", "content": """{ "text": "Text Recognition:" }""", "images": [image_path], } ], ) responses.append(response)

5.2 异步处理

利用Python的异步特性可以提高处理效率：

async def process_image(image_path): response = await asyncio.to_thread( client.chat, model="glm-ocr:latest", messages=[ { "role": "user", "content": """{ "text": "Text Recognition:" }""", "images": [image_path], } ], ) return response tasks = [process_image(path) for path in image_paths] results = await asyncio.gather(*tasks)

5.3 结果后处理

GLM-OCR返回的结果可以直接使用，也可以根据需要进行后处理：

def process_ocr_result(response): content = response["message"]["content"] # 自定义处理逻辑 processed = content.replace("\n", "<br>") # 示例：替换换行符 return processed