Agentic Workflow: AI智能体工作流的 4 大机制, crewAI源码案例解析

显示全部楼层

吴恩达前几月就发表过相关演讲,网上也有很多对这次演讲进行讲述的文章,我个人认为写得并不全,只言片语的感觉, 这篇刚好结合最近国外很火的 crewAI 自己再过一遍

本文参考square of dai 的文章, 他转述的就比国内写的全面多了;上一篇ollama+Obsidian, 自定义构建本地AI 写作助理,不用联网的灵感启发也是来自他, 他是用 LM studio 部署的

agentic workflow

“
在工作流的每一个步骤，会涉及到不同的执行任务，每个执行任务对应的不同模型（有些是大语言模型，有些是垂直应用模型）和工具来完成，每一步的模型和工具可能都不一样，最终把这一系列的步骤串联起来，形成一条整合了多个工具和步骤的端到端工作流

workflow 之前的都是大模型zero-shot,input 和 output 直接一条线就出来了,有了 workflow 输入的内容会经历工作流的循环,往返过程再输出

比如输入"write an essay outline on topic X",左边的无工作流的直接输出结果,只靠大模型

而右边有工作流的会问你"Do you need any web search"等,并且把用户命令分成多个部分执行,比如需要写草稿,检查,润色等细化步骤,有了一个机制对你的问题进行细化修正modification和迭代iteration, 每一步返回的结果作为下一步执行的输入

具体的机制就看各个 workflow 产品项目提供的配置和自定义

吴恩达总结了Agent设计的四种模式,虽然是对 agent 的模式,但把这些连起来就成了 workflow

Reflection-自我反思
Tool Use-工具调用
Planning-规划设计
Multi-agent collaboration-多智能体协同

两者的效果, Andrew Ng 团队也有测试,针对"给定一个非空的整数列表，返回所有偶数位置元素的和"(Given a non-empty list of integers, return the sum of all even-positioned elements)的回答结果测试,评估基准是由 openai手动编写针对编程代码问题的 HumanEval

GPT-4 和 GPT 3.5 zero-shot 结果分别是 67%和48%, 而添加了一个干预 intervenor的 multiagent机制和加了ANPL交互编程系统的tool use 的 GPT3.5 结果到了 70%+,比 zero-shot 的 GPT-4 效果好

因此,吴恩达自己也总结说,在将来使用 GPT-5/Claude4 之前, 你可能在有 agentic workflow 的GPT-4,GPT-3等前几代模型中得到相类似的表现效果

4 patterns

reflection:根据用户指令自己反思纠正自己的错误, 而一般 reflection需要再加个个 agent 专门用来检查

下图左边的负责根据用户质量,写 code ,右边的则负责纠正

纠正的提示词是"仔细检查代码的正确性、风格和效率，并给出建设性的批评意见，以改进代码"(Check the code carefully for correctness, style and efficiency,and give constructive criticism for how to improve it.),输入就是前一个 agent 的输出

tool use:这个最好理解,相当于加插件,加其他产品的 api 服务,像谷歌搜索api,邮箱的 api,AI 画图的 api,

像在16个AI Workflow automation（无代码AI工作流）构建平台每个产品支持的 tool 都不同,有的支持自定义自己接入 api,有的则只能按照平台提供的工具, 国外产品支持的各种国外api比较全, 国内发展目前还处于初期,缺乏很实用的工具

planning: 训练 LLM根据用户需求将任务拆解成多个步骤,多个 plan;并学习各个模型和工具的特点,来规定每个 plan/子任务使用哪个模型(有点ai 自己构建 workflow 的感觉)

下图示例: 让 ai 根据样例照片的人物姿势生成一个类似姿势的女孩看书的照片,并用声音描述出来

拆解的步骤是:

pose determination:先判断男孩的姿势,用的是 openpose model

pose to determination:通过姿势生成图片,google/vit model

image-to-text : 根据图片生成相应的描述文字, vit-gpt2 model

text-to-speech: 再把文本朗读出来, fastspeech model

multiagent collaboration : 安排多个 agents 进行写作, 不同 agent 可以承担不同的角色,就比如在reflection 提到的一个是 coder角色,负责生成代码,一个是 critic 负责检查改进

multiagent collaboration 主要难点是清晰的划分各 agent 的角色,让协作更有秩序,避免交叉冲撞

另外,抛开上面说的,吴恩达也在演讲中谈到了更大的 token,更快的 token 处理速度,更长文本处理质量也会更好

最近看了月之暗面创始人的采访, 2023 年 3,4 月时他们就压在长文本上, 今年kimi也成为了最好用的国产 LLM 之一

personal workflow

我的ai 写作workflow:

网络搜索所有相关主题素材 ———> perplexity, metaso
整理,筛选根据自己认知增添内容———> ChatGPT4,llama 3
SEO 优化标题,优化表达,剔除错别字———> ChatGPT4

crewAI

这里用 crewAI 是因为在我之前研究过的11个最受欢迎的开源Agent项目，autoGPT、metaGPT、autoGen.........这个操作起来更简单,而且刚好跟 Andrew Ng 说的 4 个 pattern 机制能对应上,刚好从代码实操体验一下上面的抽象概念

那篇文章下面有人建议用蚂蚁金融的agentUniverse,我也看过了, 并没有特别好上手,1 个 agent 既需要创建 yaml 配置文件,还需要创建个 py 文件, 多个 agent 就得建多个文件,没有 crewAI 简单

看了上面这个项目再看 crewAI 的,这些 agentic workflow, agents 相关的框架结构都有出入,就看哪个最适合自己的需求,对自己想要的 LLM 和工具支持更佳

现在很多 agents 项目都是围绕上面提到的4 个方向

“

Reflection-自我反思

Tool Use-工具调用

Planning-规划设计

Multi-agent collaboration-多智能体协同

crewAI项目是国外最近 1,2 月最火的开源项目之一,star 数量现在已经到了 14.6k

下面就用crewai 创建一个可以人类干预的调查 ai 资料+整理成文档

pip install crewai
import os 
from crewai import Agent, Task, Crew 
from crewai_tools import SerperDevTool

从上面代码可以看出 crewAI 的模块包括 agents,task,crew和 tools

agents可以使用 tools, 多个 agents 结合在一起,并被定义了角色和任务task后,成为一个团队crew,就可以直接调用 crew 来对问题进行回答

task 类似上面提到的 planning, 对用户指令进行拆分后分配给不同的 agents

准备 tool 和 LLM

# 设置网络搜索工具serper 的 aoi 和模型 api
os.environ["SERPER_API_KEY"] = "Your Key" # serper.dev API key
os.environ["OPENAI_API_KEY"] = "Your Key"

# Loading Tools
search_tool = SerperDevTool()

这里的 agents 设置了两个,一个 researcher,一个 writer(可以再加个负责 SEO 优化的)

agent 有 role角色,goal目标,backstory背景知识

设置 agents 时 verbose 参数是设置是否详细回复,ture 就是回复更详细,细节更多,false 回复会更简洁

allow_delegation 为 true 时, agent 会授权让其他 agent 来执行这项任务,为 false 就不授权其他 agent,就自己来执行

tools 设置需要的工具,这里用的 serper

max_rpm 指每分钟支持的最大请求数 request per minutes

cache 设置是否支持缓存


# 定义 agents
researcher = Agent(
  role='Senior Research Analyst',
  goal='Uncover cutting-edge developments in AI and data science',
  backstory=(
    "You are a Senior Research Analyst at a leading tech think tank."
    "Your expertise lies in identifying emerging trends and technologies in AI and data science."
    "You have a knack for dissecting complex data and presenting actionable insights."
  ),
  verbose=True,
  allow_delegation=False,
  tools=[search_tool],
  max_rpm=100
)
writer = Agent(
  role='Tech Content Strategist',
  goal='Craft compelling content on tech advancements',
  backstory=(
    "You are a renowned Tech Content Strategist, known for your insightful and engaging articles on technology and innovation."
    "With a deep understanding of the tech industry, you transform complex concepts into compelling narratives."
  ),
  verbose=True,
  allow_delegation=True,
  tools=[search_tool],
  cache=False, # Disable cache for this agent
)

创建完 agent 后创建 task,有对任务进行描述的description 参数,期待的输出expected_output,设置需要的agent agent,也可以设置human_input,设置为 true 时,需要人类输入了才能继续执行下面的 task

在描述部分也设置里,必须让人类检查草稿是否通过,通过后再整理成文到下一步任务(Make sure to check with a human if the draft is good before finalizing your answer)

#  创建 task
task1 = Task(
  description=(
    "Conduct a comprehensive analysis of the latest advancements in AI in 2024."
    "Identify key trends, breakthrough technologies, and potential industry impacts."
    "Compile your findings in a detailed report."
    "Make sure to check with a human if the draft is good before finalizing your answer."
  ),
  expected_output='A comprehensive full report on the latest AI advancements in 2024, leave nothing out',
  agent=researcher,
  human_input=True,
)

task2 = Task(
  description=(
    "Using the insights from the researcher's report, develop an engaging blog post that highlights the most significant AI advancements."
    "Your post should be informative yet accessible, catering to a tech-savvy audience."
    "Aim for a narrative that captures the essence of these breakthroughs and their implications for the future."
  ),
  expected_output='A compelling 3 paragraphs blog post formatted as markdown about the latest AI advancements in 2024',
  agent=writer
)

最后是把上面的 agents ,tool和 task 结合在一起形成一个 crew, 有点包装的意思,这样直接调用 crew 就能执行该流程

# 创建 crew
crew = Crew(
  agents=[researcher, writer],
  tasks=[task1, task2],
  verbose=2
)
# 调用 crew 进行回答
result = crew.kickoff()

print("######################")
print(result)

这个项目准备最近就部署一遍,试试效果,最后再附上官方自己对 CrewAI 的介绍:

角色基于的 agent 设计：为 agent 自定义特定的角色、目标和工具。
自主的 agent 间任务委派：agent 可以自主地委派任务并相互查询，从而提高解决问题的效率。
灵活的任务管理：使用可自定义的工具定义任务，并动态地将任务分配给 agent。
流程驱动：目前仅支持顺序任务执行和层次化流程，但更复杂的流程（如协商一致和自主流程）正在开发中。
将输出保存为文件：将各个任务的输出保存为文件，以便日后使用。
解析输出为 Pydantic 或 Json：将各个任务的输出解析为 Pydantic 模型或 Json 格式，如果你需要的话。
兼容开源模型：使用 Open AI 或开源模型运行你的 agent 组，请参考 Connect crewAI to LLMs 页面以获取配置 agent 连接到模型的详细信息，即使这些模型在本地运行。

再有看了他家官方说明文档,这么多 RAG tool,看着也挺诱人

本期照片2024.05.02拍摄于北京温榆河公园

博金斯的ai笔记, 如果文章对你有帮助,可以赞赏支持一下 ??