当前位置：首页 > news >正文

[LangChain语言模型组件的设计与实现-02]多形态的消息内容——多模态AI解决方案的基础

news 2026/7/28 11:04:40

作为消息的基类，BaseMessage利用其content字段存储原始的内容，它可以是一个字符串或者字典列表。原始的内容会转换成一个ContentBlock列表通过content_blocks的属性返回。作为消息的主体内容，它们可以是一段单纯的字符串文本，也可以一段多媒体内容（比如图片、音频和视频）或者一个二进制文件，不同的内容形态对应着相应的ContentBlock类型，这些类型之间的关系体现在如下这个UML类图中（框起来的部分）

ContentBlock并不是一个基类，而是针对六个类型的联合，它们仅仅是单纯的类型字典。这些类型具有一些相同的数据成员，比如表示专属类型的type字段，作为唯一标识的id字段，表示当偏移位置的index字段和一个存放额外数据的extras字段。

classBaseMessage(Serializable):content:str|list[str|dict]@propertydefcontent_blocks(self)->list[types.ContentBlock]ContentBlock=(TextContentBlock|InvalidToolCall|ReasoningContentBlock|NonStandardContentBlock|DataContentBlock|ToolContentBlock)

1. TextContentBlock

TextContentBlock的荷载内容是一个单纯的字符串文本。它专属的类型为“text”，作为主体内容的文本存储于text字段中。

classTextContentBlock(TypedDict):type:Literal["text"]id:NotRequired[str]text:strannotations:NotRequired[list[Annotation]]index:NotRequired[int|str]extras:NotRequired[dict[str,Any]]

它的annotations字段返回一个表示元数据标注Annotation列表。Annotation是针对Citation和NonStandardAnnotation的联合类型。在LangChain的多模态和RAG体系中，表示“引文”的Citation是TextContentBlock中最重要的标注类型，它构建了模型回答与原始数据源之间的引用关系。当模型基于外部文档（如 PDF、网页、数据库）生成回答时，它会在文本中插入引用标准，并在消息的annotations字段中提供该引用的详细元数据。

Annotation=Citation|NonStandardAnnotationclassCitation(TypedDict):type:Literal["citation"]id:NotRequired[str]url:NotRequired[str]title:NotRequired[str]start_index:NotRequired[int]end_index:NotRequired[int]cited_text:NotRequired[str]extras:NotRequired[dict[str,Any]]

Citation同样具有专属的类型“citation”，其url、title、start_index、end_index和cited_text分别表示引用的地址、标题、起止位置和引用文本。除了这种“标准”的基于引用的标注之外，其他标注都使用非标准的NonStandardAnnotation类型来定义。它对应的专属类型为“non_standard”，标注的内容以字典的形式存储于value字段。

classNonStandardContentBlock(TypedDict):type:Literal["non_standard"]id:NotRequired[str]value:dict[str,Any]index:NotRequired[int|str]

2. InvalidToolCall

InvalidToolCall是专门为处理模型幻觉或解析失败而设计的结构化错误类型。当模型通过分析提示词并确定需要调用某个工具时，它会尝试生成对应的ToolCall。如果生成的参数不具有有效结构，此时不会有异常抛出来，而是会生成一个InvalidToolCall来描述这种“生成TooCall失败”的场景。

classInvalidToolCall(TypedDict):type:Literal["invalid_tool_call"]id:str|Nonename:str|Noneargs:str|Noneerror:str|Noneindex:NotRequired[int|str]extras:NotRequired[dict[str,Any]]

InvalidToolCall专属的类型为“invalid_tool_call”，其id、name、args和error分别表示试图生成“工具调用”的唯一标识、名称、输入参数和错误描述。

3. ReasoningContentBlock

ReasoningContentBlock是专门为“推理型模型”设计的结构化内容块。它的应用标志着大模型从直接给出答案进化到了“先思考，后回答”的显式表达阶段。它专属的类型为“reasoning”，具体的推理逻辑通过reasoning字段返回的文本进行描述。

classReasoningContentBlock(TypedDict):type:Literal["reasoning"]id:NotRequired[str]reasoning:NotRequired[str]index:NotRequired[int|str]extras:NotRequired[dict[str,Any]]

4. NonStandardContentBlock

NonStandardContentBlock是一个典型的“中间层兼容方案”。它的存在是为了解决大模型行业飞速发展带来的非标准输出与LangChain 核心标准之间的冲突。大模型厂商竞争激烈，经常推出新的内容形式（比如自定义的3D渲染数据、特定的数学公式格式或私有的文件引用结构等），当 LangChain 的核心库还没来得及为它们定义专属的ContentBlock类型时，统一使用NonStandardContentBlock来表示。它对应的专属类型为“non_standard”，承载的内容存储于value字段返回的字典中。

classNonStandardContentBlock(TypedDict):type:Literal["non_standard"]id:NotRequired[str]value:dict[str,Any]index:NotRequired[int|str]

5. DataContentBlock

表示“数据内容”的DataContentBlock也不是一个具体的类型，而是针对五个具体类型的联合，它们分别对应于图片、视频、音频、纯文本和文件五种内容形式。它们与HTTP请求和响应的主体内容极其相似，而且它们的mime_type字段表示的MIME类型与HTTP中的语义是完全一致的。

DataContentBlock=(ImageContentBlock|VideoContentBlock|AudioContentBlock|PlainTextContentBlock|FileContentBlock)

这五个具体的数据内容块专属的类型分别是“image”、“video”、“audio”、“text-plain”和“file”。共同的字段除了mime_type之外，还有表示文件标识的file_id字段，表示目标地址的url字段和采用Base64编码内容base64字段。PlainTextContentBlock处理表示文本内容的text之外，还有表示标题和上下文的title和context字段。

classImageContentBlock(TypedDict):type:Literal["image"]id:NotRequired[str]file_id:NotRequired[str]mime_type:NotRequired[str]index:NotRequired[int|str]url:NotRequired[str]base64:NotRequired[str]extras:NotRequired[dict[str,Any]]classVideoContentBlock(TypedDict):type:Literal["video"]id:NotRequired[str]file_id:NotRequired[str]mime_type:NotRequired[str]index:NotRequired[int|str]url:NotRequired[str]base64:NotRequired[str]extras:NotRequired[dict[str,Any]]classAudioContentBlock(TypedDict):type:Literal["audio"]id:NotRequired[str]file_id:NotRequired[str]mime_type:NotRequired[str]index:NotRequired[int|str]url:NotRequired[str]base64:NotRequired[str]extras:NotRequired[dict[str,Any]]classPlainTextContentBlock(TypedDict):type:Literal["text-plain"]id:NotRequired[str]file_id:NotRequired[str]mime_type:Literal["text/plain"]index:NotRequired[int|str]url:NotRequired[str]base64:NotRequired[str]text:NotRequired[str]title:NotRequired[str]context:NotRequired[str]extras:NotRequired[dict[str,Any]]classFileContentBlock(TypedDict):type:Literal["file"]id:NotRequired[str]file_id:NotRequired[str]mime_type:NotRequired[str]index:NotRequired[int|str]url:NotRequired[str]base64:NotRequired[str]extras:NotRequired[dict[str,Any]]

6. ToolContentBlock

ToolContent同样不是一个具体的类型，而是与工具调用相关的五个类型的联合，其中包括前面介绍的ToolCall和ToolCallChunk。它们是语言模型的产物，是模型“工具调用”的结构化描述，分别通过AIMessage和AIMessageChunk返回给Agent，然后由后者实施调用。

ToolContentBlock=(ToolCall|ToolCallChunk|ServerToolCall|ServerToolCallChunk|ServerToolResult)

如果我们使用HTTP作为类比，这样的作法相当于客户端重定向，那么有没有服务端重定向呢？当然有，当承载模型的服务端接收到Agent发送的提示词后，它其实可以在需要的时候自行实施工具调用。ServerToolCall和ServerToolCallChunk用于木描述这种由“服务端实施”的工具调用。这两个类型的成员定义与ToolCall和ToolCallChunk很类似，专属类型分别为“server_tool_call”和“server_tool_call_chunk”

classServerToolCall(TypedDict):type:Literal["server_tool_call"]id:strname:strargs:dict[str,Any]index:NotRequired[int|str]extras:NotRequired[dict[str,Any]]classServerToolCallChunk(TypedDict):type:Literal["server_tool_call_chunk"]name:NotRequired[str]args:NotRequired[str]id:NotRequired[str]index:NotRequired[int|str]extras:NotRequired[dict[str,Any]]

ServerToolCall/ServerToolCallChunk通常与MCP或远程工具服务相关，用于描述发送给远程工具服务器的请求。承载模型的服务端可以向独立运行的“工具服务器”发送一个 RPC 指令来远程执行指定的工具。比如在使用LangGraph的ToolNode时，如果是连接到托管的 MCP 服务器（如数据库查询服务），系统会将模型的生成的“服务调用意图”转化为发往该服务器的指令。

服务端驱动的工具调用的结果可以用一个ServerToolResult对象表示，它对应的专属类型为“server_tool_result”，我们可以利用它的tool_call_id、status和output字段得到工具调用的标识、状态和输出。

classServerToolResult(TypedDict):type:Literal["server_tool_result"]id:NotRequired[str]tool_call_id:strstatus:Literal["success","error"]output:NotRequired[Any]index:NotRequired[int|str]extras:NotRequired[dict[str,Any]]