当前位置: 首页 > news >正文

【Agent】构建Harness | hermes-agent框架组件

note

  • hermes-agent实现了一个完整的 “经验提取 → 知识存储 → 智能检索 → 上下文注入 → 执行验证 → 自动改进” 闭环。是内置闭环自学习机制的项目。不是只做 task summary,而是在做一个 persistent memory + skill induction + retrieval + user modeling 的闭环。更多是工程优化
  • Skills 系统让 AI Agent 像人类专家一样积累经验——把成功的做法写成 SOP,在使用中持续修订,并且可以分享给其他人。
  • 后台审查:记忆审查、skill审查等,如异步fork出一个后台审查Agent实例,判断历史对话中有啥内容能沉淀为有价值的skill/memory。代码:hermes-agent/run_agent.py
  • Agentic RL奖励函数:
    • 比如代码能执行通过/检索正确就给1分不能就0
    • 参考OpenClaw-RL的Combined(RLVR + OPD)方法,其中RLVR的reward是三信号加权(正确70%+效率15%+工具使用15%)

文章目录

  • note
  • 一、建Harness—六大组件
  • 二、hermes agent
    • 1、后台审查Agent
    • 2、Agentic RL
  • Reference

一、建Harness—六大组件

【关于Harness】如何构建Harness——六大组件全解析,https://mp.weixin.qq.com/s/HwqEaXSGkcYgUNrzB2okuA

六大组件:
1、文件系统(工作台)
作用:不仅是存文件,更是 Agent 的“外部大脑”。用于存储中间结果、实现多 Agent 协作(通过文件共享状态)、并与 Git 集成实现版本控制和回滚。
2、Bash + 沙箱(手脚)
作用:实现“写→跑→修”的自我验证循环。沙箱提供资源隔离(如 Docker),防止 Agent 执行危险操作(如 rm -rf),是 Agent 从“顾问”变为“工程师”的关键。
3、记忆(AGENTS.md - 外挂大脑)
作用:一种“不改权重加知识”的巧妙方案。Agent 将项目规范、架构决策写入 Markdown 文件,下次启动时自动注入上下文。这比微调(Fine-tuning)成本更低,且人类可读可编辑。
4、Web Search + MCP
作用:Web Search​ 解决实时性问题(如查最新文档);MCP (Model Context Protocol)​ 是 Anthropic 推出的“AI 世界的 USB 接口”,让 Agent 能即插即用地连接数据库、Jira 等内部工具,从“搜索”升级为“连接”。
5、上下文工程(注意力管理)
作用:对抗 Context Rot(上下文腐烂)。通过压缩(Summarization)、卸载(将大段输出存文件只留摘要)、分层管理等策略,防止重要信息被淹没,保持模型“头脑清醒”。

6、编排 + Hooks(调度与质检)
作用:编排负责将大任务拆解分发给不同 Agent(如简单任务用小模型,复杂任务用大模型);Hooks​ 是质量门禁,通过确定性规则(如 Lint 检查、格式校验)拦截模型可能产生的错误输出,确保质量底线

二、hermes agent

1、后台审查Agent

后台审查Agent每当主 Agent 完成对用户的回复后,对于用户而言,交互似乎就此结束。但在后台,Hermes 通过_spawn_background_review会在后台异步启动一个审查 Agent。这是一个异步处理机制,系统会立即 Fork 出一个新的轻量级 Agent 实例,专门负责对刚刚结束的对话进行深度复盘。这个后台 Agent 不会干扰前台的用户体验,而是从三个维度对此次交互进行全方位审查的Prompt:

  • 记忆审查(_MEMORY_REVIEW_PROMPT):这段对话有什么值得记住的经验?判断这段对话中是否蕴含值得长期保留的关键经验或事实,提炼初长期记忆,存入 Agent 的记忆库
  • 技能审查(_SKILL_REVIEW_PROMPT):这个任务模式是否值得变成Skill?分析当前的任务解决路径是否具有通用性,是否值得被抽象并固化为一个可复用的Skill
  • 综合审查(_COMBINED_REVIEW_PROMPT):有什么可以改进的?反思整个执行过程中是否存在优化空间或潜在的错误模式。

具体参考源码中的prompt(截止20260429),关注THINK CLASS-FIRST. What general pattern of task did the user just complete生成skill考虑任务类别,不存具体任务;memory存用户画像如偏好等。

# ------------------------------------------------------------------# Background memory/skill review# ------------------------------------------------------------------_MEMORY_REVIEW_PROMPT=("Review the conversation above and consider saving to memory if appropriate.\n\n""Focus on:\n""1. Has the user revealed things about themselves — their persona, desires, ""preferences, or personal details worth remembering?\n""2. Has the user expressed expectations about how you should behave, their work ""style, or ways they want you to operate?\n\n""If something stands out, save it using the memory tool. ""If nothing is worth saving, just say 'Nothing to save.' and stop.")_SKILL_REVIEW_PROMPT=("Review the conversation above and consider whether a skill should be saved or updated.\n\n""Work in this order — do not skip steps:\n\n""1. SURVEY the existing skill landscape first. Call skills_list to see what you ""have. If anything looks potentially relevant, skill_view it before deciding. ""You are looking for the CLASS of task that just happened, not the exact task. ""Example: a successful Tauri build is in the class \"desktop app build ""troubleshooting\", not \"fix my specific Tauri error today\".\n\n""2. THINK CLASS-FIRST. What general pattern of task did the user just complete? ""What conditions will trigger this pattern again? Describe the class in one ""sentence before looking at what to save.\n\n""3. PREFER GENERALIZING AN EXISTING SKILL over creating a new one. If a skill ""already covers the class — even partially — update it (skill_manage patch) ""with the new insight. Broaden its \"when to use\" trigger if needed.\n\n""4. ONLY CREATE A NEW SKILL when no existing skill reasonably covers the class. ""When you create one, name and scope it at the class level ""(\"react-i18n-setup\", not \"add-i18n-to-my-dashboard-app\"). The trigger ""section must describe the class of situations, not this one session.\n\n""5. If you notice two existing skills that overlap, note it in your response ""so a future review can consolidate them. Do not consolidate now unless the ""overlap is obvious and low-risk.\n\n""Only act when something is genuinely worth saving. ""If nothing stands out, just say 'Nothing to save.' and stop.")_COMBINED_REVIEW_PROMPT=("Review the conversation above and consider two things:\n\n""**Memory**: Has the user revealed things about themselves — their persona, ""desires, preferences, or personal details? Has the user expressed expectations ""about how you should behave, their work style, or ways they want you to operate? ""If so, save using the memory tool.\n\n""**Skills**: Was a non-trivial approach used to complete a task that required trial ""and error, changing course due to experiential findings, or a different method ""or outcome than the user expected? If so, work in this order:\n"" a. SURVEY existing skills first (skills_list, then skill_view on candidates).\n"" b. Identify the CLASS of task, not the specific task ""(\"desktop app build troubleshooting\", not \"fix my Tauri error\").\n"" c. PREFER UPDATING/GENERALIZING an existing skill that covers the class.\n"" d. ONLY CREATE A NEW SKILL if no existing one covers the class. Scope at ""the class level, not this one session.\n"" e. If you notice overlapping skills during the survey, note it so a future ""review can consolidate them.\n\n""Only act if there's something genuinely worth saving. ""If nothing stands out, just say 'Nothing to save.' and stop.")

2、Agentic RL

奖励函数:
(1)比如代码能执行通过/检索正确就给1分不能就0
(2)参考OpenClaw-RL的Combined(RLVR + OPD),其中RLVR的reward是三信号加权(正确70%+效率15%+工具使用15%)
代码:hermes-agent/environments/
论文:OpenClaw-RL: Train Any Agent Simply by Talking,https://arxiv.org/pdf/2603.10165

OpenClaw-RL核心:

  • 把这些“下一状态信号”(比如用户继续追问/纠正/满意、工具返回结果/报错等)统一收集起来,作为在线训练数据。把 Agent 每次交互后的用户反馈、工具结果、环境变化,都转成在线 RL 信号,让 Agent 在真实使用中持续变强。
  • OpenClaw-RL combined advantage:A t combined = w binary r final + w opd ( log ⁡ π teacher ( a t ∣ s enhanced ) − log ⁡ π θ ( a t ∣ s t ) ) A_t^{\text {combined }}=w_{\text {binary }} r_{\text {final }}+w_{\text {opd }}\left(\log \pi_{\text {teacher }}\left(a_t \mid s_{\text {enhanced }}\right)-\log \pi_\theta\left(a_t \mid s_t\right)\right)Atcombined=wbinaryrfinal+wopd(logπteacher(atsenhanced)logπθ(atst))
  • 将二值奖励 和 教师-学生分布差异 加权成一个新的优势估计
    • 用教师模型在 hint-enhanced prompt 上跑一次 forward,得到 logprobs
    • 信用分配问题缓解
方法Advantage 来源粒度
RLVR / Binary RLPRM 给的 (r final r_{\text{final}}rfinal)response-level / sequence-level
OPDteacher-student logprob gaptoken-level
Combined两者加权和mixed

上面说的OpenClaw-RL的RLVR reward参考:

asyncdefcompute_reward(self,item:dict,result:AgentResult,ctx:ToolContext,)->float:""" Multi-signal reward: - correctness (0.7): Did the tests pass? - efficiency (0.15): Fewer turns = better - tool_usage (0.15): Did the agent actually write + run code? """cfg=self.config# ---- Signal 1: Test correctness ----# Check if test_solution.py exists and passes in the agent's sandboxcorrectness=0.0try:test_result=ctx.terminal("python test_solution.py 2>&1",timeout=30)output=test_result.get("output","")exit_code=test_result.get("exit_code",1)ifexit_code==0and"passed"inoutput.lower():correctness=1.0elifexit_code==0:correctness=0.8# Ran without error but no explicit "passed"elif"assert"inoutput.lower()and"error"inoutput.lower():correctness=0.2# Partial — code runs but assertions failelse:correctness=0.1# Code errors out entirelyexceptExceptionase:logger.debug("Test execution failed in reward: %s",e)correctness=0.0# ---- Signal 2: Efficiency ----max_turns=cfg.max_agent_turns turns_used=result.turns_usedifturns_used<=3:efficiency=1.0elifturns_used<=max_turns//2:efficiency=0.8elifturns_used<=max_turns*3//4:efficiency=0.5else:efficiency=0.2# ---- Signal 3: Tool usage ----tools_used=set()formsginresult.messages:ifmsg.get("role")=="assistant"andmsg.get("tool_calls"):fortcinmsg["tool_calls"]:fn=tc.get("function",{})ifisinstance(tc,dict)else{}name=fn.get("name","")ifname:tools_used.add(name)# Good: used both terminal and file toolsif"terminal"intools_usedand("write_file"intools_usedor"patch"intools_used):tool_usage=1.0elif"terminal"intools_used:tool_usage=0.6eliftools_used:tool_usage=0.3else:tool_usage=0.0# ---- Combine ----reward=(cfg.correctness_weight*correctness+cfg.efficiency_weight*efficiency+cfg.tool_usage_weight*tool_usage)reward=min(1.0,max(0.0,reward))# Track metricsself._reward_buffer.append(reward)self._correctness_buffer.append(correctness)self._efficiency_buffer.append(efficiency)self._tool_usage_buffer.append(tool_usage)logger.debug("Reward: correctness=%.2f, efficiency=%.2f, tool_usage=%.2f → %.3f",correctness,efficiency,tool_usage,reward,)returnreward

Reference

[1] 一文搞懂Hermes:新顶流Agent如何从经验中自我进化
[2] https://github.com/NousResearch/hermes-agent
[3] 深度解析 Hermes Agent 如何实现“自进化”及其 Prompt / Context / Harness 的设计实践
[4] https://hermes-agent.nousresearch.com/docs/user-guide/features/rl-training

http://www.jsqmd.com/news/724439/

相关文章:

  • 哔哩下载姬:一键解锁B站8K超高清视频下载神器
  • 不止于内存测试:用stressapptest给你的银河麒麟ARM桌面做一次全面‘压力体检’
  • 小伙伴投稿-认识自己具体分几个维度-有没有方法论
  • 从工厂模式到简化封装:三维引擎架构演进之路 threejs设计
  • 携程token1002 算法分析
  • 曲轴箱设计(sw+cad+说明书)
  • Android T 分屏实战:从SystemUI的WindowContainerTransaction到SurfaceFlinger,一次跨进程通信的完整拆解
  • 抖音批量下载神器:10倍效率提升,告别手动保存烦恼
  • EOR公司搞定加拿大雇佣难题:优质海外人力资源服务商盘点 - 品牌2026
  • 【第25篇】A2A 代理部署指南优化版(Python 实现)
  • 小伙伴投稿-什么时候选择吃亏-什么时候选择拒绝
  • 一键搞定完整网页截图:告别滚动拼接的烦恼 [特殊字符]
  • 如何用Sunshine搭建终极家庭游戏串流服务器:5步实现跨设备畅玩3A大作
  • DETR目标检测实战:手把手教你用Transformer实现端到端检测(附COCO数据集配置)
  • 打造专属AI语音助手:小爱音箱智能升级终极方案
  • WarcraftHelper:3个关键优化让经典魔兽争霸3焕发新生
  • PID温控踩坑记:我的STM32F4加热系统如何从‘过冲振荡’到‘平稳如狗’
  • 通过按钮改变背景颜色
  • 嵌入式——认识电子元器件——温度开关系列
  • 气门摇臂轴支座加工工艺及夹具设计CAD图纸
  • 小伙伴投稿-我们来说下海南封关
  • JetBrains IDE试用期重置终极指南:开源免费工具完全解析
  • 3步行动指南:用BetterJoy让Switch手柄在PC上完美工作
  • DeepLake:AI原生数据湖如何统一管理多模态数据与向量化检索
  • MySQL 为什么不推荐使用外键?
  • LOLIN C3 Pico开发板:RISC-V物联网开发实战解析
  • GD32F303CCT6 ADC采样卡在0.4V区间?别慌,一个时钟分频配置就搞定
  • 避开小米刷机坑:详解‘remote not allowed in locked state’与Bootloader解锁的完整流程(2024最新)
  • 小伙伴投稿-我们来说下康养行业-结合AI-
  • 从一次深夜告警说起:手把手复盘Kafka 3.5.1集群SASL认证的完整配置流程与避坑点