当前位置：首页 > news >正文

GLM-4-9B-Chat-1M上手教程：Function Call与代码执行实战

news 2026/6/17 19:07:56

GLM-4-9B-Chat-1M上手教程：Function Call与代码执行实战

1. 开篇：认识这个能读200万字的长文本专家

今天给大家介绍一个真正实用的AI模型——GLM-4-9B-Chat-1M。这个模型最厉害的地方是它能一次性处理长达100万个token的文本，相当于200万字的中文内容。想象一下，它能一口气读完一本300页的书，然后回答你的各种问题。

这个模型虽然只有90亿参数，但能力相当强悍。最重要的是，它只需要18GB显存就能运行，如果用INT4量化版本，9GB显存就够了。这意味着RTX 3090或者4090这样的显卡就能流畅运行。

我特别喜欢它的两个核心功能：Function Call（函数调用）和代码执行。这两个功能让AI不再是简单的聊天机器人，而是一个能真正帮你干活的智能助手。

2. 环境准备与快速部署

2.1 硬件要求

先看看你的设备是否满足要求：

最低配置：16GB显存（INT4量化版本）
推荐配置：24GB显存（FP16完整版本）
内存：至少32GB系统内存
存储：需要20-40GB空间存放模型文件

如果你的显卡是RTX 3090、4090，或者A100、H100，都能很好地运行这个模型。

2.2 一键部署方法

最简单的部署方式是使用Docker容器，这里给出一个快速启动的命令：

# 拉取官方镜像 docker pull swanhub/glm-4-9b-chat-1m:latest # 启动服务 docker run -d --gpus all -p 7860:7860 \ -v /path/to/models:/models \ swanhub/glm-4-9b-chat-1m:latest

等待几分钟后，打开浏览器访问http://localhost:7860就能看到Web界面了。

如果你喜欢用Python代码直接调用，可以这样安装依赖：

pip install transformers torch accelerate

3. 第一次对话：感受超长上下文能力

让我们先试试这个模型的超长文本处理能力。我准备了一个简单的测试：

from transformers import AutoModelForCausalLM, AutoTokenizer # 加载模型和分词器 model_name = "THUDM/glm-4-9b-chat-1m" tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True).cuda() # 准备一个超长文本（这里用简短文本演示，实际可以很长） long_text = "这是一段很长的文本..." # 实际使用时这里可以放整本书的内容 # 提出问题 question = "请总结这篇文章的主要内容" # 生成回答 response = model.chat(tokenizer, long_text + "\n\n" + question) print(response)

你会发现，即使文本非常长，模型也能准确理解并回答问题。这种能力在处理技术文档、学术论文、法律合同时特别有用。

4. Function Call功能实战

Function Call是GLM-4-9B-Chat-1M的杀手级功能。它能让AI模型调用你预先定义好的函数，就像给AI配了一个工具箱。

4.1 定义你的工具函数

首先，我们定义几个实用的函数：

# 定义一些工具函数 def get_weather(city: str): """获取城市天气信息""" # 这里应该是调用天气API的代码 return f"{city}今天晴天，25摄氏度" def calculate_expression(expression: str): """计算数学表达式""" try: result = eval(expression) return f"{expression} = {result}" except: return "无法计算这个表达式" def search_information(query: str): """搜索信息""" # 这里可以集成搜索引擎API return f"关于'{query}'的搜索结果：..."

4.2 让AI学会使用工具

现在我们来教模型如何使用这些工具：

# 定义工具描述 tools = [ { "name": "get_weather", "description": "获取指定城市的天气信息", "parameters": { "type": "object", "properties": { "city": {"type": "string", "description": "城市名称"} }, "required": ["city"] } }, { "name": "calculate_expression", "description": "计算数学表达式", "parameters": { "type": "object", "properties": { "expression": {"type": "string", "description": "数学表达式"} }, "required": ["expression"] } } ] # 与模型对话，让它使用工具 messages = [{"role": "user", "content": "北京今天天气怎么样？然后计算一下125乘以368等于多少"}] response = model.chat( tokenizer, messages, tools=tools, tool_choice="auto" ) print("模型回复：", response)

模型会识别出需要调用两个工具：先查询天气，再计算数学表达式。它会返回类似这样的响应：

我需要调用两个工具来回答你的问题。首先查询北京的天气，然后计算125*368。

4.3 处理工具调用结果

接下来我们执行工具调用，并把结果返回给模型：

# 解析模型返回的工具调用信息 if response.tool_calls: for tool_call in response.tool_calls: if tool_call.function.name == "get_weather": # 提取参数 import json args = json.loads(tool_call.function.arguments) weather_result = get_weather(args["city"]) elif tool_call.function.name == "calculate_expression": args = json.loads(tool_call.function.arguments) calc_result = calculate_expression(args["expression"]) # 将工具结果返回给模型继续对话 messages.append({"role": "assistant", "content": None, "tool_calls": response.tool_calls}) messages.append({"role": "tool", "content": weather_result, "tool_call_id": response.tool_calls[0].id}) messages.append({"role": "tool", "content": calc_result, "tool_call_id": response.tool_calls[1].id}) # 获取最终回答 final_response = model.chat(tokenizer, messages) print("最终回答：", final_response)

这样就能得到完整的回答了："北京今天晴天，25摄氏度。125乘以368等于46000。"

5. 代码执行功能深入探索

代码执行功能让模型能够运行Python代码，这对于数据处理、数学计算等任务特别有用。

5.1 基础代码执行

让我们从一个简单的例子开始：

# 让模型执行代码 code_request = """ 请编写一个Python函数，计算斐波那契数列的第n项，然后计算第20项的值。 """ response = model.chat(tokenizer, code_request) print(response)

模型会生成代码并执行：

def fibonacci(n): if n <= 1: return n a, b = 0, 1 for _ in range(2, n + 1): a, b = b, a + b return b result = fibonacci(20) print(f"斐波那契数列的第20项是：{result}")

5.2 数据处理实战

代码执行功能在数据处理方面特别强大：

# 让模型处理数据 data_analysis = """ 我这里有一组销售数据：[120, 150, 180, 200, 160, 140, 190, 210, 230, 250] 请计算平均销售额、最大销售额和最小销售额，并给出一个简单的数据分析报告。 """ response = model.chat(tokenizer, data_analysis) print(response)

模型会生成类似这样的代码和报告：

sales_data = [120, 150, 180, 200, 160, 140, 190, 210, 230, 250] average_sales = sum(sales_data) / len(sales_data) max_sales = max(sales_data) min_sales = min(sales_data) print(f"平均销售额：{average_sales}") print(f"最高销售额：{max_sales}") print(f"最低销售额：{min_sales}")

6. 实际应用场景示例

6.1 技术文档分析

假设你有一个很长的技术文档，想要快速找到特定信息：

# 加载长文档 with open("long_technical_document.txt", "r", encoding="utf-8") as f: long_document = f.read() # 提出问题 questions = [ "文档中提到了哪些安全最佳实践？", "第三章的主要内容是什么？", "列出所有提到的API端点及其用途" ] for question in questions: response = model.chat(tokenizer, long_document + "\n\n" + question) print(f"问题：{question}") print(f"回答：{response}\n")

6.2 自动化报告生成

结合Function Call和代码执行，可以创建自动化报告系统：

# 定义报告生成工具 def generate_report(data_source: str, report_type: str): """生成数据分析报告""" # 这里可以连接数据库或API获取数据 # 然后用模型分析数据并生成报告 report_request = f""" 根据{data_source}的数据，生成一个{report_type}报告。 包括趋势分析、关键指标和 actionable insights。 """ return model.chat(tokenizer, report_request) # 使用工具 tools = [ { "name": "generate_report", "description": "生成数据分析报告", "parameters": { "type": "object", "properties": { "data_source": {"type": "string", "description": "数据源"}, "report_type": {"type": "string", "description": "报告类型"} }, "required": ["data_source", "report_type"] } } ] response = model.chat( tokenizer, "请根据销售数据库生成月度销售报告", tools=tools )

7. 性能优化技巧

为了让模型运行更高效，这里有几个实用技巧：

7.1 使用vLLM加速推理

# 使用vLLM部署 pip install vllm python -m vllm.entrypoints.api_server \ --model THUDM/glm-4-9b-chat-1m \ --tensor-parallel-size 1 \ --gpu-memory-utilization 0.9 \ --enable-chunked-prefill \ --max-num-batched-tokens 8192

7.2 量化模型减少显存占用

如果你显存不够，可以使用INT4量化版本：

from transformers import BitsAndBytesConfig quantization_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_compute_dtype=torch.float16 ) model = AutoModelForCausalLM.from_pretrained( model_name, quantization_config=quantization_config, trust_remote_code=True )