当前位置：首页 > news >正文

HermesAgent 终端工具 Windows 兼容性修复实战：两个 Bug 的排查与解决

news 2026/4/28 12:43:20

大家好，我是张大鹏，10 年全栈开发经验，目前在做 AI 在线教育培训。最近在基于 HermesAgent 做二次开发，在 Windows 原生环境下遇到了一个很诡异的问题——Agent 的终端命令全部返回空输出。本文记录完整的排查过程和修复方案，涉及 Pythonselect.select()的 Windows 兼容性和 Git Bash 路径格式转换两个坑。

问题现象

在上一篇文章中，我已经成功在 Windows 11 原生环境下安装并运行了 HermesAgent。一切看起来都正常——API 调用没问题，模型回复也正常。

但当我让 Agent 执行终端命令时，问题出现了：

● 你知道当前的工作空间的绝对路径吗 ┊ 💻 preparing terminal… ┊ 💻 $ pwd 0.9s ┊ 💻 preparing terminal… ┊ 💻 $ echo $PWD 0.3s ┊ 💻 preparing terminal… ┊ 💻 ls -la (54.6s)

Agent 跑了一圈，最后告诉我：

抱歉，当前环境遇到了一些异常——终端命令全部返回空输出，文件读取也无法正常工作。

命令确实执行了（有耗时），但输出全部为空。Agent 自己也拿不到结果，只能跟我说"你自己在终端里跑一下吧"。

这不行啊。HermesAgent 的核心能力之一就是终端工具——它要靠执行 shell 命令来读文件、装依赖、跑测试。终端工具废了，Agent 就是个只会聊天的废柴。

排查过程

第一步：确认 Git Bash 本身没问题

HermesAgent 在 Windows 上用 Git Bash 执行命令（不是 PowerShell）。先确认 Git Bash 本身能不能正常工作：

$"C:\Program Files\Git\usr\bin\bash.EXE"-c"pwd"/d/code/HermesAgent

没问题。Git Bash 本身工作正常。

第二步：用 Python 直接调用 subprocess

importsubprocess bash_path=r'C:\Program Files\Git\usr\bin\bash.EXE'proc=subprocess.Popen([bash_path,'-c','echo hello'],stdout=subprocess.PIPE,stderr=subprocess.STDOUT,text=True,)output=proc.stdout.read()print(repr(output))# 'hello\n'

也没问题。Python 的subprocess能正常捕获 Git Bash 的输出。

第三步：用 HermesAgent 的终端工具调用

fromtools.environments.localimportLocalEnvironment env=LocalEnvironment(cwd=r'D:\code\HermesAgent',timeout=10)result=env.execute('echo hello')print(repr(result.get('output','')))# ''print(repr(result.get('returncode')))# 0

输出为空，但返回码是 0！命令执行成功了，就是拿不到输出。

这就奇怪了——同样的 bash，同样的命令，直接调subprocess没问题，但通过 HermesAgent 的终端工具就丢输出。

第四步：看终端工具的执行流程

HermesAgent 的终端工具执行命令分三步：

init_session：启动一个 bash 登录 shell，把环境变量快照到临时文件
execute：每次执行命令时，先 source 快照文件，再 cd 到工作目录，再执行命令
_wait_for_process：用select.select()轮询管道，收集输出

关键在第三步。_wait_for_process里有一个_drain()函数负责从管道读取输出：

def_drain():fd=proc.stdout.fileno()idle_after_exit=0try:whileTrue:try:ready,_,_=select.select([fd],[],[],0.1)except(ValueError,OSError):break# fd already closed ← 问题在这里！ifready:chunk=os.read(fd,4096)ifnotchunk:breakoutput_chunks.append(decoder.decode(chunk))idle_after_exit=0elifproc.poll()isnotNone:idle_after_exit+=1ifidle_after_exit>=3:breakfinally:# Flush buffered bytes...

看到那个except (ValueError, OSError): break了吗？

第五步：验证 select.select() 在 Windows 上的行为

importsubprocess,select proc=subprocess.Popen([bash_path,'-c','echo hello'],stdout=subprocess.PIPE,stderr=subprocess.STDOUT,text=True,)fd=proc.stdout.fileno()try:ready,_,_=select.select([fd],[],[],1.0)print('select works! ready:',ready)exceptExceptionase:print('select FAILED:',type(e).__name__,e)

输出：

select FAILED: OSError [WinError 10093] 应用程序没有调用 WSAStartup，或者 WSAStartup 失败。

根因找到了！

Python 的select.select()在 Windows 上只支持 socket，不支持管道（pipe）文件描述符。这是 Windows 的限制——Windows 的select()函数来自 WinSock 库，只能用于网络 socket，不能用于文件描述符。

所以_drain()函数在 Windows 上：

调用select.select()→ 抛出OSError
捕获异常 →break退出循环
输出永远不会被读取

命令确实执行了，输出也在管道里，但没人去读它。

Bug 1 修复：Windows 上用阻塞读取替代 select

修复思路很简单：在 Windows 上，select.select()不能用，那就直接用阻塞的os.read()。因为父线程有超时机制，超时后会 kill 进程，所以阻塞读取不会永远卡住。

def_drain():fd=proc.stdout.fileno()idle_after_exit=0try:whileTrue:try:ready,_,_=select.select([fd],[],[],0.1)except(ValueError,OSError):# 在 Windows 上，select() 不支持管道 fd# 改用阻塞读取（父线程超时会 kill 进程，不会永远卡住）ifplatform.system()=="Windows":try:chunk=os.read(fd,4096)except(ValueError,OSError):breakifnotchunk:breakoutput_chunks.append(decoder.decode(chunk))continuebreak# 非 Windows 环境，fd 已关闭ifready:# ... 正常读取逻辑不变

核心改动就一处：当select.select()在 Windows 上抛出OSError时，不直接 break，而是改用阻塞的os.read()继续读取输出。

以为修好了，结果又炸了

修完select.select()的问题后，我满怀信心地测试：

env=LocalEnvironment(cwd=r'D:\code\HermesAgent',timeout=10)result=env.execute('echo hello')

结果：

NotADirectoryError: [WinError 267] 目录名称无效。

等等，刚才还好好的，怎么现在连命令都执行不了了？

新问题：CWD 路径格式错误

加了调试信息后发现：

env=LocalEnvironment(cwd=r'D:\code\HermesAgent',timeout=10)print('cwd:',env.cwd)# /d/code/HermesAgent

self.cwd从D:\code\HermesAgent变成了/d/code/HermesAgent！

这是init_session的"副作用"。init_session会执行pwd -P并把结果写入临时文件，然后_update_cwd读取这个文件更新self.cwd。

在 Git Bash 中，pwd -P返回的是 Unix 风格路径：

$pwd-P/d/code/HermesAgent

但 Windows 的subprocess.Popen(cwd=...)不认识/d/code/HermesAgent，它需要D:\code\HermesAgent。

所以执行流程是：

__init__设置self.cwd = 'D:\code\HermesAgent'（正确）
init_session调用 bash，bash 执行pwd -P写入/d/code/HermesAgent
_update_cwd读取临时文件，把self.cwd更新为/d/code/HermesAgent（错误！）
下次execute时，subprocess.Popen(cwd='/d/code/HermesAgent')→ 报错

Bug 2 修复：Git Bash 路径自动转 Windows 路径

需要在_update_cwd中把 Git Bash 的/d/code/...格式转换为 Windows 的D:\code\...格式：

@staticmethoddef_git_bash_to_win_path(path:str)->str:"""把 Git Bash 路径 (/d/code/...) 转为 Windows 路径 (D:\\code\\...)."""importre m=re.match(r"^/([a-zA-Z])(/.*)?$",path)ifm:drive=m.group(1).upper()rest=(m.group(2)or"").replace("/","\\")returnf"{drive}:{rest}"returnpath

然后在_update_cwd中调用：

def_update_cwd(self,result:dict):try:cwd_path=open(self._cwd_file).read().strip()ifcwd_path:if_IS_WINDOWS:cwd_path=self._git_bash_to_win_path(cwd_path)self.cwd=cwd_pathexcept(OSError,FileNotFoundError):passself._extract_cwd_from_output(result)# _extract_cwd_from_output 也可能从 marker 中读到 Git Bash 路径if_IS_WINDOWS:self.cwd=self._git_bash_to_win_path(self.cwd)

注意最后三行——_extract_cwd_from_output也会从命令输出的 CWD marker 中解析路径并设置self.cwd，所以需要在它之后再做一次转换。

修复验证

两个 Bug 都修完后，终端工具终于正常了：

env=LocalEnvironment(cwd=r'D:\code\HermesAgent',timeout=10)r1=env.execute('pwd')print('pwd:',r1['output'].strip())# pwd: /d/code/HermesAgentr2=env.execute('echo hello world')print('echo:',r2['output'].strip())# echo: hello worldr3=env.execute('ls pyproject.toml')print('ls:',r3['output'].strip())# ls: pyproject.tomlr4=env.execute('python --version')print('python:',r4['output'].strip())# python: Python 3.13.13r5=env.execute('cd hermes_workspace && pwd')print('cd:',env.cwd)# cd: D:\code\HermesAgent\hermes_workspace

命令输出正常，CWD 跟踪也正确（自动转换为 Windows 路径）。

用 hermes 实际测试：

$ python-mhermes_cli.main-z"当前工作空间的绝对路径是什么？用pwd命令确认"当前工作空间的绝对路径是: /d/code/HermesAgent

Agent 终于能正常执行终端命令并拿到输出了。

技术总结

Bug 1：select.select() 的 Windows 限制

项目	说明
根因	Python 的`select.select()`在 Windows 上只支持 socket，不支持管道 fd
表现	`_drain()`函数捕获`OSError`后立即退出，输出丢失
影响	所有终端命令返回空输出，Agent 无法执行任何 shell 操作
修复	Windows 上改用阻塞`os.read()`，依赖父线程超时机制保证不死锁

Bug 2：Git Bash 路径格式不兼容

项目	说明
根因	Git Bash 的`pwd -P`返回`/d/code/...`格式，Windows 的`subprocess.Popen`不认识
表现	`init_session`后`self.cwd`变为 Git Bash 路径，后续命令报`NotADirectoryError`
影响	除第一条命令外，所有后续命令都会失败
修复	在`_update_cwd`中用正则把`/d/...`转换为`D:\...`

修改的文件

tools/environments/base.py：添加import platform，修改_drain()函数
tools/environments/local.py：添加_git_bash_to_win_path()方法，修改_update_cwd()方法

经验教训

select.select()在 Windows 上是半残的。它只支持 socket，不支持文件描述符。如果你的 Python 代码需要用select来轮询管道输出，一定要在 Windows 上做特殊处理。
Git Bash 的路径格式和 Windows 不兼容。Git Bash 用/d/code/...表示D:\code\...，在跨环境调用时一定要做路径转换。
“能跑"不等于"能用”。HermesAgent 在 Windows 上启动没问题，API 调用也没问题，但终端工具这个核心功能是坏的。如果只是跑个hermes -z "1+1=?"验证一下，根本发现不了这个问题。只有真正让 Agent 去执行任务，才会暴露出来。
源码级调试的重要性。如果我只是用hermes.exe，根本不知道内部发生了什么。正是因为用源码方式启动（python -m hermes_cli.main），才能直接打断点、加日志、定位问题。