当前位置：首页 > news >正文

【CrewAI系列7】我用 AI Agent 做性能测试，发现了 1 个致命瓶颈

news 2026/4/24 17:55:53

作者：测试员周周（14 年测试/QA 老兵）

>系列：CrewAI 多 Agent 测试框架实战（第 7 篇，暂定24篇）
>字数：约 4,500 字
>阅读时间：11 分钟
>声明：本文所有测试数据均为真实执行，代码来自自研系统

开篇：一个让我意外的性能测试结果

今天上午，我对自己的 crewai-web-platform 系统做性能测试。

测试前，我信心满满：

FastAPI 框架，性能应该不错
本地部署，没有网络延迟
接口简单，只是返回 JSON

测试结果让我意外：

场景 1：健康检查接口（简单 GET） ✅ P95: 105ms ✅ QPS: 166

场景 2：详细健康检查（复杂 GET）
❌ P95: 2137ms（是场景 1 的 20 倍！）
❌ QPS: 9.38（下降了 94%！）

同一个系统，同样并发，为什么性能差了 20 倍？这就是性能测试的价值。

1. 性能测试的 3 个常见误区

误区 1：功能正常 = 性能 OK

❌ 错误想法： "接口能返回 200，应该没问题"

✅ 真实情况：

接口返回 200，但响应时间 3 秒
95% 的用户在等待中流失
系统没崩溃，但体验极差

功能测试保证系统能用，性能测试保证系统好用。

误区 2：上线后再优化

❌ 错误做法： "先上线，有问题再优化"

✅ 真实代价：

上线后用户已经流失
紧急修复可能引入新 Bug
架构问题很难事后修复

误区 3：性能测试很复杂

❌ 传统认知：

要学 JMeter
要写压测脚本
要分析复杂报告

✅ AI Agent 方案：

用自然语言描述测试场景
AI 自动执行并生成报告
瓶颈分析直接给出建议

2. PerformanceTestTool：我的性能测试武器

这是我在 crewai-web-platform 系统中实现的工具：

2.1 核心代码（精简展示）

python from crewai.tools import BaseTool import requests, time, uuid, threading from concurrent.futures import ThreadPoolExecutor, as_completed import statistics from typing import Dict, Any, List, Optional

class PerformanceTestTool(BaseTool):
"""性能测试工具类（优化版）"""

name: str = "performance_test"
description: str = "执行接口性能测试，支持并发请求和压力测试"

def _run(
self, url: str, method: str = "GET",
concurrent_users: int = 10, iterations: int = 100,
body: Optional[Dict[str, Any]] = None,
headers: Optional[Dict[str, str]] = None,
enable_cache_bust: bool = True
) -> Dict[str, Any]:
"""执行性能测试（优化版）"""

# 1. 并发启动控制（CountDownLatch 等效）
start_gate = threading.Event()

response_times = []
success_count = error_count = 0
status_codes = {}
lock = threading.Lock()

# 2. 准确 QPS 计算
actual_start_time = actual_end_time = None
time_lock = threading.Lock()

def send_request(request_id):
nonlocal success_count, error_count, actual_start_time, actual_end_time

start_gate.wait() # 等待并发启动信号

with time_lock:
if actual_start_time is None:
actual_start_time = time.time()

try:
# 3. 防缓存 + 动态参数化
test_url = url
if enable_cache_bust:
test_url = f"{url}{'&' if '?' in url else '?'}_cache_bust={uuid.uuid4()}"

req_start = time.time()
response = requests.request(method, test_url, json=body, timeout=60)
req_end = time.time()

with lock:
response_times.append((req_end - req_start) * 1000)
if response.status_code == 200:
success_count += 1
else:
error_count += 1
status_codes[response.status_code] = \
status_codes.get(response.status_code, 0) + 1

with time_lock:
actual_end_time = time.time()
except Exception:
with lock:
error_count += 1
with time_lock:
actual_end_time = time.time()

# 4. 并发执行
with ThreadPoolExecutor(max_workers=concurrent_users) as executor:
futures = [executor.submit(send_request, i) for i in range(iterations)]
start_gate.set() # 所有任务提交后，同时启动
for future in as_completed(futures):
try:
future.result()
except Exception:
pass # 内部已处理，不重复计数

# 5. 计算结果
actual_total_time = (actual_end_time - actual_start_time) if actual_start_time else 0

if response_times:
sorted_times = sorted(response_times)
return {
"total_requests": iterations,
"success_rate": f"{success_count/iterations*100:.2f}%",
"p95_response_time_ms": round(sorted_times[int(len(sorted_times)*0.95)], 2),
"qps": round(iterations / actual_total_time, 2),
# ... 其他指标
}
return {"error": "所有请求失败"}

💡 核心优化点：

优化点	实现方式	效果
并发启动控制	`threading.Event()`	所有线程同时发起请求
准确 QPS 计算	记录首尾时间戳	排除队列等待时间
防缓存机制	UUID 参数	避免服务器缓存虚高
动态参数化	request_id	模拟真实用户行为
异常处理优化	内部计数	避免重复计算

3. 真实测试：我的系统性能如何？

测试时间：2026-04-24 （优化版）

测试系统：crewai-web-platform（FastAPI 后端）

测试环境：本地部署

优化点：CountDownLatch 并发控制 + 防缓存机制 + 准确 QPS 计算

3.1 场景 1：简单接口基准测试

python

测试配置

url = "http://localhost:8000/health"

concurrent_users = 10

iterations = 50

enable_cache_bust = True # 启用防缓存（优化版）：

指标	数值	评价
P95 响应时间	105.86ms	✅ 良好
成功率	100.00%	✅ 完美
QPS	166.74	✅ 高

结论：简单接口性能良好，符合预期。

3.2 场景 2：复杂接口性能测试

python

测试配置

url = "http://localhost:8000/health/detailed"

concurrent_users = 20

iterations = 50

enable_cache_bust = True # 启用防缓存（优化版）：

指标	数值	评价
P95 响应时间	2137.9ms	❌ 危险
成功率	100.00%	✅ 完美
QPS	9.38	❌ 很低

对比场景 1：

响应时间：105ms → 2137ms（增长 20 倍）
QPS：166 → 9.38（下降 94%）

问题定位：

详细健康检查接口做了什么？

python @app.get("/health/detailed") async def detailed_health_check(): # 1. 采集 CPU 使用率（阻塞 0.1 秒） cpu_percent = psutil.cpu_percent(interval=0.1) # 2. 采集内存信息 memory = psutil.virtual_memory() # 3. 采集磁盘信息 disk = psutil.disk_usage('/') # 4. 采集进程信息 process = psutil.Process(os.getpid()) process_memory = process.memory_info().rss

瓶颈找到了：`psutil.cpu_percent(interval=0.1)` 每次调用阻塞 0.1 秒，并发时累积成 2 秒延迟！

3.3 场景 3：高并发压力测试

python

测试配置

url = "http://localhost:8000/health"

concurrent_users = 50

iterations = 100

enable_cache_bust = True # 启用防缓存真实结果（优化版）：

指标	数值	评价
P95 响应时间	364.29ms	✅ 良好
成功率	100.00%	✅ 完美
QPS	212.37	✅ 更高

意外发现：并发从 10 增加到 50，QPS 不降反升（166→212）！

原因分析：FastAPI 的异步特性，高并发时资源利用率更高。

4. 性能对比全景图

场景	接口	并发	P95	QPS	瓶颈
场景 1	`/health`	10	105ms	166	无
场景 2	`/health/detailed`	20	2137ms	9.38	psutil 阻塞
场景 3	`/health`	50	364ms	212	无

核心发现：

1.简单接口性能良好：P95<150ms，QPS>160
2.复杂接口有致命瓶颈：psutil 导致 20 倍性能下降
3.系统并发能力良好：50 并发仍能保持 P95<400ms
4.优化版 QPS 更准确：排除队列等待时间，反映真实性能

5. 性能优化的 5 个实战建议

基于这次测试，我总结了 5 个优化建议（部分已在我的系统中实现）：

建议 1：避免阻塞调用

python

❌ 错误写法（阻塞 0.1 秒）

cpu_percent = psutil.cpu_percent(interval=0.1)

✅ 正确写法（非阻塞）

cpu_percent = psutil.cpu_percent(interval=None)我的系统已实现此优化。

建议 2 & 3：终极优化方案（异步 + 缓存）

结合异步执行和缓存机制，我重构了监控代码，彻底解决阻塞问题：

python import time import psutil import asyncio

class SystemMonitor:
def __init__(self):
self._cache = {"data": None, "expires_at": 0}

def get_system_info_sync(self):
"""同步采集（耗时操作，在后台线程执行）"""
return {
"cpu": psutil.cpu_percent(interval=0.1), # 💡 后台线程可用 interval，数据更准
"memory": psutil.virtual_memory().percent,
"disk": psutil.disk_usage('/').percent,
"timestamp": time.time()
}

async def get_system_info(self, cache_ttl=5.0):
"""异步获取系统信息（带缓存）"""
now = time.time()
# 1. 检查缓存
if now < self._cache["expires_at"] and self._cache["data"] is not None:
return self._cache["data"]

# 2. 异步执行耗时的采集操作，防止阻塞主线程
loop = asyncio.get_running_loop()
new_data = await loop.run_in_executor(None, self.get_system_info_sync)

# 3. 更新缓存
self._cache["data"] = new_data
self._cache["expires_at"] = now + cache_ttl
return new_data

在 FastAPI 中使用

monitor = SystemMonitor()

@app.get("/health")
async def health_check():
try:
# 设置 3 秒超时，防止采集卡死
data = await asyncio.wait_for(monitor.get_system_info(), timeout=3.0)
return {"status": "healthy", "metrics": data}
except asyncio.TimeoutError:
return {"status": "degraded", "message": "Metrics collection timeout"}

优化效果：

优化点	原理	效果
不阻塞主线程	`run_in_executor` 扔给后台线程	FastAPI 继续处理其他请求
减少采集频率	5 秒 TTL 缓存	避免高频调用 `psutil`
精准数据	后台线程可用 `interval=0.1`	解决首次返回 0.0 的问题
防雪崩	`asyncio.wait_for` 超时控制	采集卡死不影响接口响应

建议 4：设置超时时间

python

所有接口统一超时配置

@app.get("/health/detailed") async def detailed_health_check(): try: # 设置 5 秒超时 result = await asyncio.wait_for( collect_system_metrics(), timeout=5.0 ) return result except asyncio.TimeoutError: return {"status": "timeout", "message": "采集超时"}理论建议，我的系统尚未实现。

建议 5：持续监控

python

添加性能监控中间件

@app.middleware("http") async def add_performance_metrics(request, call_next): start_time = time.time() response = await call_next(request) process_time = (time.time() - start_time) * 1000 response.headers["X-Process-Time"] = str(process_time) return response理论建议，我的系统尚未实现。

6. 性能测试的 5 个避坑指南（通用经验）

坑 1：只看平均值

❌ 错误做法： 平均响应时间 100ms，达标！

✅ 正确做法：
P95 响应时间 2000ms，有 5% 用户体验极差！

平均值骗人，P95 不会。

坑 2：并发数设置不合理

❌ 错误做法： 并发 10 个用户，测出来没问题就上线

✅ 正确做法：
按预估流量的 1.5-2 倍设置并发数

坑 3：忽略网络延迟

❌ 错误做法： 只测本地，不测生产环境

✅ 正确做法：
在生产环境或准生产环境压测

坑 4：没有基线对比

❌ 错误做法： 测一次就完事

✅ 正确做法：
每次优化后都压测，用数据说话

坑 5：不模拟真实场景

❌ 错误做法： 只压测一个接口

✅ 正确做法：
模拟完整用户链路

7. 小结

核心要点：

1.并发测试- ThreadPoolExecutor 实现并发
2.统计指标- 平均值、P95、P99、QPS
3.线程安全- Lock 保护共享数据
4.瓶颈定位- 用对比测试找问题
5.持续优化- 基于数据做决策

🎯 互动环节

你在性能测试中遇到过哪些坑？

A. 上线后系统崩溃，被老板骂
B. 压测没问题，一上线就挂
C. 找不到瓶颈，熬夜排查
D. 其他（评论区聊聊）

📊 测试数据声明

本文所有测试数据均为真实执行：

测试时间：2026-04-24
测试系统：crewai-web-platform
测试脚本：crewai-web-platform/run_perf_test_business_v2.py`
结果文件：`perf_test_business_v2_20260424_150350.json`

代码优化点：

1.并发启动控制：使用 `threading.Event()` 实现 CountDownLatch 等效功能
2.准确 QPS 计算：记录第一个和最后一个请求的实际时间戳
3.防缓存机制：自动添加 UUID 参数，避免服务器缓存
4.动态参数化：每个请求注入唯一 request_id

所有数据真实可验证

📚 系列文章索引

序号	文章	状态
01	CrewAI 入门指南	✅
02	Agent 角色设计方法论	✅
03	多 Agent 协作流程	✅
04	APITestTool 实现	✅
05	DatabaseTestTool 集成	✅
06	测试工具开发实战	✅
07	PerformanceTestTool 实现	✅ 本篇
08	UITestTool 集成 Selenium	📝 下一篇
09	工具的测试与验证	📝