当前位置：首页 > news >正文

Qwen3-4B-Instruct教程：AutoGen Studio中Agent测试框架搭建、单元测试与回归验证

news 2026/5/11 23:54:49

Qwen3-4B-Instruct教程：AutoGen Studio中Agent测试框架搭建、单元测试与回归验证

1. 环境准备与快速部署

在开始构建AI代理测试框架之前，我们需要确保基础环境已经正确部署。本教程使用的是内置vllm部署的Qwen3-4B-Instruct-2507模型服务的AutoGen Studio环境。

首先检查vllm模型服务是否正常启动：

# 查看模型服务日志 cat /root/workspace/llm.log

如果看到类似下图的输出，说明模型服务已成功启动：

AutoGen Studio是一个低代码界面，专门帮助开发者快速构建AI代理、通过工具增强代理能力、组建代理团队并与之交互完成任务。它基于AutoGen AgentChat构建，提供了构建多代理应用的高级API。

2. 模型配置与基础验证

2.1 配置AssistantAgent模型参数

进入AutoGen Studio的Web界面，我们需要先配置AssistantAgent使用的模型参数。

点击左侧菜单的"Team Builder"，然后编辑AssistantAgent的配置：

在Model Client配置中，设置以下参数：

Model:Qwen3-4B-Instruct-2507
Base URL:http://localhost:8000/v1

配置完成后，发起测试验证模型是否连接成功。如果看到类似下图的响应，说明模型配置成功：

2.2 基础功能测试

现在进入Playground界面，新建一个Session并进行简单的提问测试：

这是一个简单的功能验证，确保模型能够正常响应基本的对话请求。

3. Agent测试框架搭建

3.1 测试环境配置

为了构建可靠的Agent测试框架，我们需要设置专门的测试配置。创建测试配置文件test_config.py：

import os from autogen import AssistantAgent, UserProxyAgent class TestConfig: """测试配置类""" MODEL_NAME = "Qwen3-4B-Instruct-2507" BASE_URL = "http://localhost:8000/v1" # 测试用例超时时间（秒） TEST_TIMEOUT = 300 # 测试数据目录 TEST_DATA_DIR = "./test_data" @staticmethod def create_test_agent(): """创建测试用的AssistantAgent""" return AssistantAgent( name="test_assistant", llm_config={ "model": TestConfig.MODEL_NAME, "base_url": TestConfig.BASE_URL, "api_type": "open_ai", "api_key": "NULL" } ) @staticmethod def create_user_proxy(): """创建测试用的UserProxyAgent""" return UserProxyAgent( name="test_user", human_input_mode="NEVER", code_execution_config={"work_dir": TestConfig.TEST_DATA_DIR} )

3.2 基础测试用例编写

创建基础测试文件test_basic_functionality.py：

import unittest from test_config import TestConfig class BasicFunctionalityTest(unittest.TestCase): """基础功能测试类""" def setUp(self): """测试前准备""" self.assistant = TestConfig.create_test_agent() self.user_proxy = TestConfig.create_user_proxy() def test_simple_conversation(self): """测试简单对话功能""" # 初始化对话 self.user_proxy.initiate_chat( self.assistant, message="你好，请介绍一下你自己", clear_history=True ) # 验证响应不为空 last_message = self.user_proxy.last_message() self.assertIsNotNone(last_message) self.assertGreater(len(last_message["content"]), 10) def test_code_generation(self): """测试代码生成能力""" self.user_proxy.initiate_chat( self.assistant, message="请用Python写一个计算斐波那契数列的函数", clear_history=True ) last_message = self.user_proxy.last_message() # 检查是否包含Python代码 self.assertIn("def", last_message["content"]) self.assertIn("fib", last_message["content"].lower()) def tearDown(self): """测试后清理""" self.user_proxy.reset()

4. 单元测试体系构建

4.1 测试用例组织结构

建立完整的测试目录结构：

tests/ ├── unit/ # 单元测试 │ ├── __init__.py │ ├── test_basic.py │ ├── test_code_generation.py │ └── test_reasoning.py ├── integration/ # 集成测试 │ ├── __init__.py │ ├── test_multi_agent.py │ └── test_tool_usage.py ├── regression/ # 回归测试 │ ├── __init__.py │ └── test_regression.py └── conftest.py # 测试配置

4.2 核心单元测试示例

创建代码生成能力测试tests/unit/test_code_generation.py：

import unittest import re from test_config import TestConfig class CodeGenerationTest(unittest.TestCase): """代码生成能力测试""" def setUp(self): self.assistant = TestConfig.create_test_agent() self.user_proxy = TestConfig.create_user_proxy() def test_python_function_generation(self): """测试Python函数生成""" test_cases = [ { "request": "写一个Python函数计算阶乘", "expected_keywords": ["def", "factorial", "n", "return"] }, { "request": "创建一个处理字符串反转的函数", "expected_keywords": ["def", "reverse", "str", "return"] } ] for case in test_cases: with self.subTest(case=case["request"]): self.user_proxy.initiate_chat( self.assistant, message=case["request"], clear_history=True ) response = self.user_proxy.last_message()["content"] # 验证包含预期的关键词 for keyword in case["expected_keywords"]: self.assertIn(keyword, response) def test_code_correctness(self): """测试生成代码的正确性""" self.user_proxy.initiate_chat( self.assistant, message="写一个函数检查数字是否为素数", clear_history=True ) response = self.user_proxy.last_message()["content"] # 提取代码块 code_blocks = re.findall(r'```python\n(.*?)\n```', response, re.DOTALL) self.assertTrue(len(code_blocks) > 0, "未找到Python代码块")

4.3 推理能力测试

创建推理能力测试tests/unit/test_reasoning.py：

import unittest from test_config import TestConfig class ReasoningTest(unittest.TestCase): """推理能力测试""" def setUp(self): self.assistant = TestConfig.create_test_agent() self.user_proxy = TestConfig.create_user_proxy() def test_logical_reasoning(self): """测试逻辑推理能力""" reasoning_problems = [ { "problem": "如果所有的猫都会爬树，而咪咪是一只猫，那么咪咪会爬树吗？", "expected_answer": "会" }, { "problem": "小明比小红高，小红比小刚高，那么谁最高？", "expected_answer": "小明" } ] for problem in reasoning_problems: with self.subTest(problem=problem["problem"]): self.user_proxy.initiate_chat( self.assistant, message=problem["problem"], clear_history=True ) response = self.user_proxy.last_message()["content"] # 检查是否包含预期答案 self.assertIn(problem["expected_answer"], response)

5. 集成测试与多Agent协作

5.1 多Agent协作测试

创建多Agent协作测试tests/integration/test_multi_agent.py：

import unittest from autogen import GroupChat, GroupChatManager from test_config import TestConfig class MultiAgentTest(unittest.TestCase): """多Agent协作测试""" def setUp(self): # 创建多个Agent self.assistant1 = TestConfig.create_test_agent() self.assistant2 = AssistantAgent( name="specialist", llm_config={ "model": TestConfig.MODEL_NAME, "base_url": TestConfig.BASE_URL, "api_type": "open_ai", "api_key": "NULL" }, system_message="你是一个专业的问题解决专家" ) self.user_proxy = TestConfig.create_user_proxy() # 创建群组聊天 self.groupchat = GroupChat( agents=[self.user_proxy, self.assistant1, self.assistant2], messages=[], max_round=6 ) self.manager = GroupChatManager(groupchat=self.groupchat) def test_multi_agent_collaboration(self): """测试多Agent协作解决问题""" self.user_proxy.initiate_chat( self.manager, message="我们需要设计一个完整的用户注册系统，包括前端和后端，请给出设计方案", clear_history=True ) # 检查对话轮次 self.assertGreaterEqual(len(self.groupchat.messages), 4) # 检查是否有多个Agent参与 participants = set(msg["name"] for msg in self.groupchat.messages) self.assertGreaterEqual(len(participants), 2)

5.2 工具使用测试

创建工具使用测试tests/integration/test_tool_usage.py：

import unittest from test_config import TestConfig class ToolUsageTest(unittest.TestCase): """工具使用能力测试""" def setUp(self): self.assistant = TestConfig.create_test_agent() self.user_proxy = TestConfig.create_user_proxy() def test_data_processing_request(self): """测试数据处理相关的工具使用""" self.user_proxy.initiate_chat( self.assistant, message="我有一个CSV文件，请帮我写代码读取并分析数据", clear_history=True ) response = self.user_proxy.last_message()["content"] # 检查是否提到相关的Python库 expected_libraries = ["pandas", "csv", "numpy"] found_libraries = [lib for lib in expected_libraries if lib in response.lower()] self.assertGreaterEqual(len(found_libraries), 1, f"预期至少提到一个数据处理库，但得到: {response}")

6. 回归测试与持续验证

6.1 回归测试套件

创建回归测试文件tests/regression/test_regression.py：

import unittest import json import os from datetime import datetime from test_config import TestConfig class RegressionTest(unittest.TestCase): """回归测试套件""" def setUp(self): self.assistant = TestConfig.create_test_agent() self.user_proxy = TestConfig.create_user_proxy() self.test_results_dir = "./test_results" os.makedirs(self.test_results_dir, exist_ok=True) def test_critical_functionality(self): """关键功能回归测试""" critical_tests = [ {"query": "你好，请做自我介绍", "min_length": 50}, {"query": "用Python写一个Hello World程序", "keywords": ["print", "Hello"]}, {"query": "解释一下机器学习是什么", "keywords": ["学习", "数据", "算法"]} ] results = [] for test in critical_tests: with self.subTest(query=test["query"]): self.user_proxy.initiate_chat( self.assistant, message=test["query"], clear_history=True ) response = self.user_proxy.last_message()["content"] # 验证响应质量 if "min_length" in test: self.assertGreaterEqual(len(response), test["min_length"]) if "keywords" in test: for keyword in test["keywords"]: self.assertIn(keyword, response) results.append({ "query": test["query"], "response_length": len(response), "timestamp": datetime.now().isoformat(), "status": "passed" }) # 保存测试结果 self._save_test_results(results) def _save_test_results(self, results): """保存测试结果""" filename = f"regression_test_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json" filepath = os.path.join(self.test_results_dir, filename) with open(filepath, 'w', encoding='utf-8') as f: json.dump({ "test_date": datetime.now().isoformat(), "model": TestConfig.MODEL_NAME, "results": results }, f, ensure_ascii=False, indent=2)

6.2 自动化测试脚本

创建自动化测试运行脚本run_tests.py：

#!/usr/bin/env python3 """ 自动化测试运行脚本 """ import unittest import sys import os from datetime import datetime def run_all_tests(): """运行所有测试""" # 添加测试目录到Python路径 sys.path.append(os.path.dirname(os.path.abspath(__file__))) # 发现并运行测试 loader = unittest.TestLoader() start_dir = os.path.join(os.path.dirname(__file__), 'tests') suite = loader.discover(start_dir, pattern='test_*.py') # 运行测试 runner = unittest.TextTestRunner(verbosity=2) result = runner.run(suite) # 输出测试结果摘要 print(f"\n{'='*50}") print(f"测试完成时间: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}") print(f"运行测试数: {result.testsRun}") print(f"失败数: {len(result.failures)}") print(f"错误数: {len(result.errors)}") print(f"{'='*50}") return result.wasSuccessful() if __name__ == '__main__': success = run_all_tests() sys.exit(0 if success else 1)

7. 测试框架优化与最佳实践

7.1 测试数据管理

为了确保测试的可靠性和可重复性，我们需要建立测试数据管理系统：

import json import os from pathlib import Path class TestDataManager: """测试数据管理类""" def __init__(self, base_dir="./test_data"): self.base_dir = Path(base_dir) self.base_dir.mkdir(exist_ok=True) def save_test_case(self, category, name, input_data, expected_output=None): """保存测试用例""" category_dir = self.base_dir / category category_dir.mkdir(exist_ok=True) test_case = { "input": input_data, "expected_output": expected_output, "timestamp": datetime.now().isoformat() } filepath = category_dir / f"{name}.json" with open(filepath, 'w', encoding='utf-8') as f: json.dump(test_case, f, ensure_ascii=False, indent=2) def load_test_cases(self, category): """加载特定类别的所有测试用例""" category_dir = self.base_dir / category if not category_dir.exists(): return [] test_cases = [] for filepath in category_dir.glob("*.json"): with open(filepath, 'r', encoding='utf-8') as f: test_cases.append(json.load(f)) return test_cases

7.2 性能监控与告警

添加性能监控功能以确保测试的稳定性：

import time import statistics from functools import wraps def monitor_performance(func): """性能监控装饰器""" @wraps(func) def wrapper(*args, **kwargs): start_time = time.time() result = func(*args, **kwargs) end_time = time.time() execution_time = end_time - start_time wrapper.execution_times.append(execution_time) if len(wrapper.execution_times) > 10: avg_time = statistics.mean(wrapper.execution_times[-10:]) if execution_time > avg_time * 2: print(f"警告: {func.__name__} 执行时间异常: {execution_time:.2f}s") return result wrapper.execution_times = [] return wrapper # 在测试方法上使用性能监控 class PerformanceTest(unittest.TestCase): """性能测试类""" @monitor_performance def test_response_time(self): """测试响应时间性能""" start_time = time.time() self.user_proxy.initiate_chat( self.assistant, message="简单问候测试", clear_history=True ) end_time = time.time() response_time = end_time - start_time self.assertLessEqual(response_time, 30.0, "响应时间超过30秒")