手把手带你理解OpenManus - 链载Ai

我之前演示过几个OpenManus的demo，其实也就是demo，包括manus，现在也就是demo阶段，复杂的plan和flow，现在的代码支撑和LLM的能力都有待改善，但是我们这期不是吐槽文章，是来把OpenManus给打开看看它的实现是怎么样的，其实Manus也都差不多，甚至OWL也都差不多，我们看一个就够了

其他的几个目录也没啥特别需要看的，就看app

app里面有这么几个结构

1- agent 没啥好解释的

2- flow 就是来做multiagent的planning和管理任务框架的

都是这种形式，来定义系统提示词和agent的role

主要看最重要的目录agent

大概总体分这么几个agent

先看base：base.py模块定义了抽象基类BaseAgent，用于管理代理的状态、内存、执行循环（包括运行、步骤执行、卡住处理）和消息，并提供初始化和配置功能，为构建具有特定行为的代理提供基础框架。

在看planning：planning.py模块定义了PlanningAgent类，该代理通过PlanningTool和Terminate等工具创建、管理和执行任务计划。它具有初始化、计划创建(create_initial_plan)、思考(think)、行动(act)、计划状态更新(update_plan_status)、步骤跟踪(step_execution_tracker)等功能，并能根据工具执行结果动态调整计划，处理初始请求(run)并检索当前计划状态(get_plan)。

然后是 react.py:react.py模块定义了继承自BaseAgent的抽象类ReActAgent，它通过think(思考，决定下一步行动) 和act(执行行动) 两个抽象方法（需子类实现）以及step方法（整合think和act）来处理和执行任务，并提供基础的任务处理框架。

swe和tool就是指从code和tool了

manus.py:manus.py模块定义了Manus类，一个继承自ToolCallAgent(实际上你前面提到了继承自 PlanningAgent, 请确认是哪个) 的通用智能代理。Manus具有预定义的名称、描述、系统提示、步骤限制(max_observe,max_steps)，并利用包含PythonExecute、WebSearch、BrowserUseTool、FileSaver和Terminate等工具的available_tools集合来执行各种任务，并通过_handle_special_tool方法处理（如清理）BrowserUseTool的结果。

  asyncrun(request):ifstate != IDLE:  raise RuntimeError
ifrequest:  update_memory(USER, request)
asyncwithstate_context(RUNNING): whilecurrent_step < max_stepsandstate != FINISHED:   current_step +=1   step_result =awaitstep() # 调用子类的 step() 方法  ifis_stuck():    handle_stuck_state()
   results.append(step_result)
 ifcurrent_step >= max_steps:   state = IDLE   results.append("Terminated: Reached max steps")
return"\n".join(results)

asyncstep():should_act=awaitthink()#调用子类的think()ifshould_act:returnawaitact()#调用子类的act()else:return"Thinkingcomplete-noactionneeded"

asyncrun(request):ifrequest:  awaitcreate_initial_plan(request)returnawaitsuper().run() # 调用 ToolCallAgent.run() -> ReActAgent.run() -> BaseAgent.run()

asynccreate_initial_plan(request):# 1. 构造消息，让 LLM 创建计划 messages = [...] response =awaitllm.ask_tool(..., tool_choice=ToolChoice.AUTO)
# 2. 处理 LLM 的响应，提取工具调用（应该是 planning 工具的调用）fortool_callinresponse.tool_calls: iftool_call.function.name =="planning":   result =awaitexecute_tool(tool_call)# 执行 planning 工具  # 3. 将工具执行结果（计划）存入内存   update_memory(TOOL, result, tool_call_id=tool_call.id)

asyncthink():  prompt =f"CURRENT PLAN STATUS:\n{awaitself.get_plan()}\n\n{self.next_step_prompt}"  self.messages.append(Message.user_message(prompt))  self.current_step_index =awaitself._get_current_step_index()  result =awaitsuper().think()# 调用 ToolCallAgent 的 think()
 ifresultandself.tool_calls:   # 记录工具和步骤的关联   latest_tool_call = self.tool_calls[0]   iflatest_tool_call 不是 planning tool 且 不是 special tool:     self.step_execution_tracker[latest_tool_call.id] = {       "step_index": self.current_step_index,       "tool_name": latest_tool_call.function.name,       "status":"pending"     } returnresult
asyncact(): result =awaitsuper().act()# 调用 ToolCallAgent 的 act() ifself.tool_calls:   latest_tool_call = self.tool_calls[0]   iflatest_tool_call.idinself.step_execution_tracker:     self.step_execution_tracker[latest_tool_call.id]["status"] ="completed"     self.step_execution_tracker[latest_tool_call.id]["result"] = result
    iflatest_tool_call 不是 planning tool 且 不是 special tool:      awaitself.update_plan_status(latest_tool_call.id) returnresult

asyncupdate_plan_status(tool_call_id):# 1. 检查 tool_call_id 是否在 tracker 中，以及状态是否为 completed# 2. 调用 planning 工具的 mark_step 命令，将对应步骤标记为 completed
async_get_current_step_index(): # 1. 获取当前计划 (文本) # 2. 解析计划文本，找到第一个 [ ] 或 [→] 的步骤 # 3. 调用 planning 工具的 mark_step 命令，将当前步骤设置为 in_progress # 4. 返回步骤索引

asyncthink():self.working_dir=awaitself.bash.execute("pwd")#获取当前工作目录self.next_step_prompt=self.next_step_prompt.format(current_dir=self.working_dir)#更新提示returnawaitsuper().think()#调用ToolCallAgent的think()

总体调用的抽象感觉就是下图这样

好了今天这节课就解释到这，大家可以结合我的解释自己去run一下代码，甚至自己按着这个逻辑来新写一套multi-agents的框架也不是特别难的事

那么就到这里，再见吧！