|
大模型越来越强大,但它们依旧有一个致命短板:知识更新慢。如果直接问 ChatGPT 之类的模型一个近期事件的问题,它很可能答不上来。这就是为什么RAG(检索增强生成)变得重要 —— 在回答问题之前,先去找相关资料,再让模型结合这些资料生成答案。 不过,RAG 并不是“一刀切”的方案:有些问题根本不需要检索(比如定义类问题),有些问题需要一次检索就能解决,而另一些则需要多次尝试(比如先改写问题,再检索)。这就是自适应RAG的核心:根据问题的不同,动态选择最合适的策略。 本文我们将用LangGraph + 本地 LLM(Ollama + Mistral)搭建一个 Adaptive RAG 系统,能在Web 搜索和向量库检索之间灵活切换,还能自我纠错。 注意:我们的 Adaptive RAG 系统有两个主要分支: Web Search:处理最近事件相关的问题(因为向量库的数据是历史快照,不会包含最新信息)。借助Tavily 搜索 API获取网页结果,再交给 LLM 组织答案。
Self-Corrective RAG:针对我们自己构建的知识库(这里我们抓取了 Lilian Weng 的几篇经典博客:Agent、Prompt Engineering、Adversarial Attack)。向量库用Chroma搭建,文本向量用Nomic 本地 Embedding生成。如果第一次检索结果不相关,会尝试改写问题,再次检索。同时会过滤掉“答非所问”的文档,避免垃圾结果。 1. 环境准备%capture--no-stderr%pipinstall-Ulangchain-nomiclangchain_communitytiktokenlangchainhubchromadblangchainlanggraphtavily-pythonnomic[local] 设置 API Key(Tavily 搜索 + Nomic embedding)。import getpass, os
def _set_env(var: str): if not os.environ.get(var): os.environ[var] = getpass.getpass(f"{var}: ")
_set_env("TAVILY_API_KEY")_set_env("NOMIC_API_KEY")
2. 本地模型和向量库我们将要构建了一个向量数据库,内容是 Lilian Weng 的三篇博客。以后凡是涉及Agent/Prompt Engineering/Adversarial Attack的问题,就走这里。 # Ollama 模型local_llm ="mistral"
# 文本切分 & 向量化from langchain.text_splitter import RecursiveCharacterTextSplitterfrom langchain_community.document_loaders import WebBaseLoaderfrom langchain_community.vectorstores import Chromafrom langchain_nomic.embeddings import NomicEmbeddings
urls = [ "https://lilianweng.github.io/posts/2023-06-23-agent/", "https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/", "https://lilianweng.github.io/posts/2023-10-25-adv-attack-llm/",]docs = [WebBaseLoader(url).load() for url in urls]docs_list = [item for sublist in docs for item in sublist]
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder( chunk_size=250, chunk_overlap=0)doc_splits = text_splitter.split_documents(docs_list)
vectorstore = Chroma.from_documents( documents=doc_splits, collection_name="rag-chroma", embedding=NomicEmbeddings(model="nomic-embed-text-v1.5", inference_mode="local"),)retriever = vectorstore.as_retriever()
3. 问题路由器(Router) 假如这个问题和 Agent 相关,所以走向量库。 fromlangchain.promptsimportPromptTemplatefromlangchain_community.chat_modelsimportChatOllamafromlangchain_core.output_parsersimportJsonOutputParser
llm = ChatOllama(model=local_llm,format="json", temperature=0)
prompt = PromptTemplate( template="""You are an expert at routing a user question to a vectorstore or web search... Question to route: {question}""", input_variables=["question"],)question_router = prompt | llm | JsonOutputParser()
question ="llm agent memory"print(question_router.invoke({"question": question}))
执行结果 {'datasource':'vectorstore'}4. 检索质量评估(Retrieval Grader)retrieval_grader=prompt|llm|JsonOutputParser()question="agentmemory"docs=retriever.get_relevant_documents(question)doc_txt=docs[1].page_contentprint(retrieval_grader.invoke({"question":question,"document":doc_txt}))
执行结果
成了一段关于 “Agent Memory” 的解释。from langchain import hubfrom langchain_core.output_parsers import StrOutputParser
prompt = hub.pull("rlm/rag-prompt")llm = ChatOllama(model=local_llm, temperature=0)
rag_chain = prompt | llm | StrOutputParser()
question ="agent memory"generation = rag_chain.invoke({"context": docs,"question": question})print(generation)
执行结果 InanLLM-poweredautonomousagentsystem,theLargeLanguageModel(LLM)functionsastheagent'sbrain... 6. 幻觉检测(Hallucination Grader)如果答案确实是基于文档生成的,没有瞎编。如果答案不靠谱,就让系统重新检索或改写问题。hallucination_grader=prompt|llm|JsonOutputParser()hallucination_grader.invoke({"documents":docs,"generation":generation})7. 答案有用性评估(Answer Grader)answer_grader.invoke({"question":question,"generation":generation})执行结果 8. 问题重写器(Question Rewriter)question_rewriter.invoke({"question":question})'Whatisagentmemoryandhowcanitbeeffectivelyutilizedinvectordatabaseretrieval?' 当问题和近期事件有关时,就会走 Tavily 搜索,而不是本地库。fromlangchain_community.tools.tavily_searchimportTavilySearchResultsweb_search_tool=TavilySearchResults(k=3) ---ROUTEQUESTION---WhatistheAlphaCodiumpaperabout?{'datasource':'web_search'}---ROUTEQUESTIONTOWEBSEARCH------WEBSEARCH---"Node'web_search':"'---'---GENERATE------CHECKHALLUCINATIONS------DECISION:GENERATIONISGROUNDEDINDOCUMENTS------GRADEGENERATIONvsQUESTION------DECISION:GENERATIONADDRESSESQUESTION---"Node'generate':"'---'('TheAlphaCodiumpaperintroducesanewapproachforcodegeneration...')10.工作流(LangGraph 具体实现) 我们用LangGraph把这些步骤连起来,形成一个有条件分支的工作流: 开始→ 判断走 Web Search 还是 Vectorstore 如果走 Vectorstore:检索 → 文档过滤 → 如果靠谱 → 返回结果 如果不靠谱 → 改写问题 → 再检索
如果没文档:改写问题 → 再检索 如果有文档:生成答案 → 检查是否靠谱
如果走 Web Search:直接搜 → 生成答案 → 检查 → 返回结果
最终,系统能在不同类型的问题上灵活切换,而不是死板地“一问一搜”。 fromtypingimportListfromtyping_extensionsimportTypedDictclassGraphState(TypedDict):"""Representsthestateofourgraph.Attributes:question:questiongeneration LMgenerationdocuments:listofdocuments"""question:strgeneration:strdocuments ist[str]###Nodesfromlangchain.schemaimportDocumentdefretrieve(state):"""RetrievedocumentsArgs:state(dict):ThecurrentgraphstateReturns:state(dict):Newkeyaddedtostate,documents,thatcontainsretrieveddocuments"""print("---RETRIEVE---")question=state["question"]#Retrievaldocuments=retriever.get_relevant_documents(question)return{"documents":documents,"question":question}defgenerate(state):"""GenerateanswerArgs:state(dict):ThecurrentgraphstateReturns:state(dict):Newkeyaddedtostate,generation,thatcontainsLLMgeneration"""print("---GENERATE---")question=state["question"]documents=state["documents"]#RAGgenerationgeneration=rag_chain.invoke({"context":documents,"question":question})return{"documents":documents,"question":question,"generation":generation}defgrade_documents(state):"""Determineswhethertheretrieveddocumentsarerelevanttothequestion.Args:state(dict):ThecurrentgraphstateReturns:state(dict):Updatesdocumentskeywithonlyfilteredrelevantdocuments"""print("---CHECKDOCUMENTRELEVANCETOQUESTION---")question=state["question"]documents=state["documents"]#Scoreeachdocfiltered_docs=[]fordindocuments:score=retrieval_grader.invoke({"question":question,"document":d.page_content})grade=score["score"]ifgrade=="yes":print("---GRADE OCUMENTRELEVANT---")filtered_docs.append(d)else:print("---GRADE OCUMENTNOTRELEVANT---")continuereturn{"documents":filtered_docs,"question":question}deftransform_query(state):"""Transformthequerytoproduceabetterquestion.Args:state(dict):ThecurrentgraphstateReturns:state(dict):Updatesquestionkeywithare-phrasedquestion"""print("---TRANSFORMQUERY---")question=state["question"]documents=state["documents"]#Re-writequestionbetter_question=question_rewriter.invoke({"question":question})return{"documents":documents,"question":better_question}defweb_search(state):"""Websearchbasedonthere-phrasedquestion.Args:state(dict):ThecurrentgraphstateReturns:state(dict):Updatesdocumentskeywithappendedwebresults"""print("---WEBSEARCH---")question=state["question"]#Websearchdocs=web_search_tool.invoke({"query":question})web_results="\n".join([d["content"]fordindocs])web_results=Document(page_content=web_results)return{"documents":web_results,"question":question}###Edges###defroute_question(state):"""RoutequestiontowebsearchorRAG.Args:state(dict):ThecurrentgraphstateReturns:str:Nextnodetocall"""print("---ROUTEQUESTION---")question=state["question"]print(question)source=question_router.invoke({"question":question})print(source)print(source["datasource"])ifsource["datasource"]=="web_search":print("---ROUTEQUESTIONTOWEBSEARCH---")return"web_search"elifsource["datasource"]=="vectorstore":print("---ROUTEQUESTIONTORAG---")return"vectorstore"defdecide_to_generate(state):"""Determineswhethertogenerateananswer,orre-generateaquestion.Args:state(dict):ThecurrentgraphstateReturns:str:Binarydecisionfornextnodetocall"""print("---ASSESSGRADEDDOCUMENTS---")state["question"]filtered_documents=state["documents"]ifnotfiltered_documents:#Alldocumentshavebeenfilteredcheck_relevance#Wewillre-generateanewqueryprint("---DECISION:ALLDOCUMENTSARENOTRELEVANTTOQUESTION,TRANSFORMQUERY---")return"transform_query"else:#Wehaverelevantdocuments,sogenerateanswerprint("---DECISION:GENERATE---")return"generate"defgrade_generation_v_documents_and_question(state):"""Determineswhetherthegenerationisgroundedinthedocumentandanswersquestion.Args:state(dict):ThecurrentgraphstateReturns:str ecisionfornextnodetocall"""print("---CHECKHALLUCINATIONS---")question=state["question"]documents=state["documents"]generation=state["generation"]score=hallucination_grader.invoke({"documents":documents,"generation":generation})grade=score["score"]#Checkhallucinationifgrade=="yes":print("---DECISION:GENERATIONISGROUNDEDINDOCUMENTS---")#Checkquestion-answeringprint("---GRADEGENERATIONvsQUESTION---")score=answer_grader.invoke({"question":question,"generation":generation})grade=score["score"]ifgrade=="yes":print("---DECISION:GENERATIONADDRESSESQUESTION---")return"useful"else:print("---DECISION:GENERATIONDOESNOTADDRESSQUESTION---")return"notuseful"else:pprint("---DECISION:GENERATIONISNOTGROUNDEDINDOCUMENTS,RE-TRY---") fromlanggraph.graphimportEND,StateGraph,STARTworkflow=StateGraph(GraphState)#Definethenodesworkflow.add_node("web_search",web_search)#websearchworkflow.add_node("retrieve",retrieve)#retrieveworkflow.add_node("grade_documents",grade_documents)#gradedocumentsworkflow.add_node("generate",generate)#generateworkflow.add_node("transform_query",transform_query)#transform_query#Buildgraphworkflow.add_conditional_edges(START,route_question,{"web_search":"web_search","vectorstore":"retrieve",},)workflow.add_edge("web_search","generate")workflow.add_edge("retrieve","grade_documents")workflow.add_conditional_edges("grade_documents",decide_to_generate,{"transform_query":"transform_query","generate":"generate",},)workflow.add_edge("transform_query","retrieve")workflow.add_conditional_edges("generate",grade_generation_v_documents_and_question,{"notsupported":"generate","useful":END,"notuseful":"transform_query",},)#Compileapp=workflow.compile()inputs={"question":"WhatistheAlphaCodiumpaperabout?"}foroutputinapp.stream(inputs):forkey,valueinoutput.items():pprint(f"Node'{key}':")pprint("\n---\n")pprint(value["generation"])执行结果 ---ROUTEQUESTION---WhatistheAlphaCodiumpaperabout?{'datasource':'web_search'}---ROUTEQUESTIONTOWEBSEARCH------WEBSEARCH---"Node'web_search':"'---'---GENERATE------CHECKHALLUCINATIONS------DECISION:GENERATIONISGROUNDEDINDOCUMENTS------GRADEGENERATIONvsQUESTION------DECISION:GENERATIONADDRESSESQUESTION---"Node'generate':"'---'('TheAlphaCodiumpaperintroducesanewapproachforcodegeneration...')
我们写的这套 自适应 RAG系统展示了几个关键点: 灵活路由:不同问题走不同管道(Web / Vectorstore)。 自我纠错:检索结果不相关时,自动改写问题再试。 质量把控:通过“幻觉检测 + 答案有用性判断”,尽量避免胡编乱造。 本地化:Embedding 和 LLM 都可以跑在本地(隐私友好,节省成本)。 未来可以扩展的方向包括:增加“多步推理”路线(先子问题分解,再检索)。更细的路由分类(比如结构化查询 vs 自然语言查询)。融合图数据库或知识图谱,增强事实性。 |