当前位置：首页 > news >正文

Envoy AI Gateway自定义资源详解：AIGatewayRoute与InferencePool配置

news 2026/7/15 8:30:22

Envoy AI Gateway自定义资源详解：AIGatewayRoute与InferencePool配置

【免费下载链接】ai-gatewayEnvoy AI Gateway is an open source project for using Envoy Gateway to handle request traffic from application clients to Generative AI services.项目地址: https://gitcode.com/gh_mirrors/aiga/ai-gateway

Envoy AI Gateway是一个开源项目，利用Envoy Gateway处理从应用客户端到生成式AI服务的请求流量。本文将深入解析其核心自定义资源AIGatewayRoute与InferencePool的配置方法，帮助你快速掌握这两个关键组件的使用技巧。

AIGatewayRoute：AI流量路由的核心配置

AIGatewayRoute是Envoy AI Gateway中用于定义AI服务路由规则的核心资源，它允许你将多个AI后端服务组合起来，并附加到Gateway资源上，为客户端提供统一的AI API接口。

AIGatewayRoute的主要组成部分

AIGatewayRoute主要由以下几个部分构成：

ParentRefs：指定该路由规则附加到哪些Gateway资源
Rules：定义路由规则的具体内容，包括匹配条件和后端引用
FilterConfig：配置AI Gateway过滤器的相关参数
LLMRequestCosts：指定如何捕获LLM相关请求的成本，如token使用量

规则定义与后端引用

每个AIGatewayRoute规则可以包含多个匹配条件和后端引用。规则中的BackendRefs字段支持两种类型的后端引用：

AIServiceBackend（默认）：直接引用AI服务后端
InferencePool：引用推理池资源，提供更高级的负载均衡和故障转移能力

图：Envoy AI Gateway资源模型展示了InferencePool在整个架构中的位置

AIGatewayRoute配置示例

以下是一个基本的AIGatewayRoute配置示例：

apiVersion: aigateway.envoyproxy.io/v1alpha1 kind: AIGatewayRoute metadata: name: example-ai-gateway-route spec: parentRefs: - name: my-gateway rules: - matches: - headers: - name: x-ai-eg-model value: gpt-4 backendRefs: - name: openai-backend weight: 80 - name: azure-openai-backend weight: 20 priority: 1 llmRequestCosts: - metadataKey: llm_input_token type: InputToken - metadataKey: llm_output_token type: OutputToken

InferencePool：智能推理资源管理

InferencePool是Gateway API推理扩展中的资源，用于管理一组推理模型端点，提供负载均衡、故障转移和流量控制能力。

InferencePool的核心功能

端点选择：根据模型负载情况选择最优端点
故障转移：自动检测并避开故障的模型实例
负载均衡：在多个模型实例间分配流量
版本控制：支持同一模型的多个版本共存

AIGatewayRoute与InferencePool的集成

在AIGatewayRoute中引用InferencePool时，需要注意以下几点：

每个规则只能引用一个InferencePool
不能在同一规则中混合引用InferencePool和AIServiceBackend
InferencePool的故障转移行为由其自身的端点选择器处理

请求流程示例

图：展示了客户端请求通过AIGatewayRoute和InferencePool的完整流程

请求流程说明：

客户端发送请求到/completions端点
Kubernetes Gateway基于请求内容中的模型名称选择合适的InferencePool
根据模型负载情况选择最优的模型副本
将请求路由到选定的InferencePool和模型实例

高级配置技巧

流量控制与优先级

通过设置权重(weight)和优先级(priority)，可以实现精细化的流量控制：

backendRefs: - name: primary-inference-pool group: inference.networking.k8s.io kind: InferencePool weight: 90 - name: fallback-inference-pool group: inference.networking.k8s.io kind: InferencePool weight: 10 priority: 1

请求转换与修改

AIGatewayRoute支持对请求头和请求体进行修改，以适应不同后端服务的要求：

backendRefs: - name: openai-backend headerMutation: set: - name: X-API-Key value: "{{ .Env.OPENAI_API_KEY }}" bodyMutation: set: - path: "temperature" value: "0.7" remove: ["top_p", "frequency_penalty"]

成本监控与限制

通过LLMRequestCosts配置，可以捕获和监控token使用情况，结合Envoy Gateway的BackendTrafficPolicy实现基于token的速率限制：

llmRequestCosts: - metadataKey: llm_input_token type: InputToken - metadataKey: llm_output_token type: OutputToken - metadataKey: llm_total_token type: TotalToken