链载Ai

标题: 多文档代理式 RAG 工作流程 [打印本页]

作者: 链载Ai    时间: 2025-12-2 10:00
标题: 多文档代理式 RAG 工作流程

ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;font-size: 15px;letter-spacing: normal;text-wrap: wrap;">

ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;font-size: 15px;letter-spacing: normal;text-wrap: wrap;text-align: center;">多文档代理式 RAG 工作流程

ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;letter-spacing: normal;text-wrap: wrap;">导言

ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;font-size: 15px;letter-spacing: normal;text-wrap: wrap;">大型语言模型 (LLM) 彻底改变了我们从海量文本数据中提取见解的方式。在财务分析领域,LLM 应用程序正在被设计用来帮助分析师回答有关公司业绩、收益报告和市场趋势的复杂问题。

ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;font-size: 15px;letter-spacing: normal;text-wrap: wrap;">其中一个应用涉及使用检索增强生成 (RAG) 管道来促进从财务报表和其他来源中提取信息。

ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;font-size: 15px;letter-spacing: normal;text-wrap: wrap;">假设财务分析师想要了解公司第二季度财报电话会议的主要内容,特别是关注公司正在建立的技术壁垒。这类问题超越了简单的查找,需要更复杂的方法。这就是 LLM 代理的概念发挥作用的地方。

ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;letter-spacing: normal;text-wrap: wrap;">什么是代理?

ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;font-size: 15px;letter-spacing: normal;text-wrap: wrap;">根据 Llama-Index 的说法,“代理”是一个自动推理和决策引擎。它接收用户输入/查询,并可以做出内部决策来执行该查询,以便返回正确的结果。代理的关键组件可能包括但不限于:

ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;font-size: 15px;letter-spacing: normal;text-wrap: wrap;">LLM 代理是一个系统,它结合了各种技术,如规划、定制焦点、内存利用和使用不同的工具来回答复杂问题。

让我们分解如何开发 LLM 代理来回答上述问题:

来源:代理的通用组件

工具调用

在标准 RAG 中,LLM 主要用于信息合成。

另一方面,工具调用在 RAG 管道之上增加了一层查询理解层,使用户能够提出复杂的查询并获得更精确的结果。这使 LLM 能够弄清楚如何使用向量数据库,而不仅仅是使用其输出。

工具调用使 LLM 能够通过动态接口与外部环境交互,在该接口中,工具调用不仅有助于选择合适的工具,而且还能推断执行所需的论点。因此,与标准 RAG 相比,它能够更好地理解请求并生成更好的响应。

代理推理循环

如果用户提出一个包含多个步骤的复杂问题,或者提出一个需要澄清的模糊问题怎么办?代理推理循环在这种情况下发挥作用。代理能够在工具上进行推理,经过多个步骤,而不是一次性调用。

来源:llama index

代理架构

在 LlamaIndex 中,代理包含两个组件:

AgentRunner对象与AgentWorkers交互。

AgentRunners是协调器,它们存储:

AgentWorkers负责:

来源:Llama-Index

调用代理查询允许以一次性方式查询代理,但不会保留状态。这就是内存方面发挥作用的地方,它用于维护对话历史记录。在这里,代理将聊天历史记录维护到一个对话记忆缓冲区中。默认情况下,记忆缓冲区是一个扁平的项目列表,它是一个滚动缓冲区,具体取决于 LLM 的上下文窗口大小。因此,当代理决定使用工具时,它不仅使用当前聊天,而且还使用以前的对话历史记录来执行下一组操作。

在这里,我们将构建一个多文档代理来处理多个文档。在这里,我们已经在 3 个文档上实现了代理式 RAG,同样的方法可以扩展到更多文档。

使用的技术栈

Mistral Large 带来了新的功能和优势:

代码实现

代码使用 google colab 实现。

安装所需依赖项:

%%writefile requirements.txt
llama-index
llama-index-llms-huggingface
llama-index-embeddings-fastembed
fastembed
Unstructured[md]
chromadb
llama-index-vector-stores-chroma
llama-index-llms-groq
einops
accelerate
sentence-transformers
llama-index-llms-mistralai
llama-index-llms-openai
!pip install -r requirements.txt

下载要处理的文档:

!mkdir data

! wget "https://arxiv.org/pdf/1810.04805.pdf" -O ./data/BERT_arxiv.pdf
! wget "https://arxiv.org/pdf/2005.11401" -O ./data/RAG_arxiv.pdf
! wget "https://arxiv.org/pdf/2310.11511" -O ./data/self_rag_arxiv.pdf
! wget "https://arxiv.org/pdf/2401.15884" -O ./data/crag_arxiv.pdf

导入所需的依赖项:

from llama_index.core import SimpleDirectoryReader,VectorStoreIndex,SummaryIndex
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import StorageContext
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.tools import FunctionTool,QueryEngineTool
from llama_index.core.vector_stores import MetadataFilters,FilterCondition
from typing import List,Optional


importnest_asyncio
nest_asyncio.apply()

读取文档:

documents = SimpleDirectoryReader(input_files = ['./data/self_rag_arxiv.pdf']).load_data()
print(len(documents))
print(f"Document Meta{documents[0].metadata}")

将文档拆分成块/节点

splitter = SentenceSplitter(chunk_size=1024,chunk_overlap=100)
nodes = splitter.get_nodes_from_documents(documents)
print(f"Length of nodes : {len(nodes)}")
print(f"get the content for node 0 :{nodes[0].get_content(metadata_mode='all')}")

输出:

Length of nodes : 43
get the content for node 0 :page_label: 1
file_name: self_rag_arxiv.pdf
file_path: data/self_rag_arxiv.pdf
file_type: application/pdf
file_size: 1405127
creation_date: 2024-05-11
last_modified_date: 2023-10-19

Preprint.
SELF-RAG: LEARNING TO RETRIEVE , GENERATE ,AND
CRITIQUE THROUGH SELF-REFLECTION
...(略)

实例化向量存储

import chromadb
db = chromadb.PersistentClient(path="./chroma_db_mistral")
chroma_collection = db.get_or_create_collection("multidocument-agent")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

实例化嵌入模型

from llama_index.embeddings.fastembed import FastEmbedEmbedding
from llama_index.core import Settings

embed_model = FastEmbedEmbedding(model_name="BAAI/bge-small-en-v1.5")

Settings.embed_model = embed_model

Settings.chunk_size = 1024

实例化 LLM

from llama_index.llms.mistralai import MistralAI
os.environ["MISTRAL_API_KEY"] = userdata.get("MISTRAL_API_KEY")
llm = MistralAI(model="mistral-large-latest")

图片:

简要概述:

这段代码展示了如何将文本拆分成多个块并存储在向量数据库中,以及如何实例化嵌入模型和 LLM。最后,它还展示了一个名为 Self-RAG 的框架,用于提高 LLM 的生成质量和准确性。

需要注意的是,代码中的部分内容可能需要根据实际情况进行修改,例如,需要修改 MISTRAL_API_KEY 的值,并根据自己的需要调整其他参数。

为特定文档实例化向量查询工具和摘要工具

LlamaIndex 数据代理处理自然语言输入以执行操作,而不是生成响应。创建有效数据代理的关键在于抽象工具。但在这种情况下,工具究竟指的是什么?可以将工具视为为代理交互而设计的 API 接口,而不是为人类设计的接口。

核心概念:

有多种类型的工具可用:

name = "BERT_arxiv"
vector_index = VectorStoreIndex(nodes,storage_context=storage_context)
vector_index.storage_context.vector_store.persist(persist_path="/content/chroma_db")

def vector_query(query:str,page_numbers:Optional[List[str]]=None)->str:
'''
perform vector search over index on
query(str): query string needs to be embedded
page_numbers(List[str]): list of page numbers to be retrieved,
leave blank if we want to perform a vector search over all pages
'''
page_numbers = page_numbers or []
metadata_dict = [{"key":'page_label',"value":p} for p in page_numbers]

query_engine = vector_index.as_query_engine(similarity_top_k =2,
filters = MetadataFilters.from_dicts(metadata_dict,
condition=FilterCondition.OR)
)

response = query_engine.query(query)
return response

vector_query_tool = FunctionTool.from_defaults(name=f"vector_tool_{name}",
fn=vector_query)

summary_index = SummaryIndex(nodes)
summary_query_engine = summary_index.as_query_engine(response_mode="tree_summarize",
se_async=True,)
summary_query_tool = QueryEngineTool.from_defaults(name=f"summary_tool_{name}",
query_engine=summary_query_engine,
description=("Use ONLY IF you want to get a holistic summary of the documents."
"DO NOT USE if you have specified questions over the documents."))

测试 LLM

response = llm.predict_and_call([vector_query_tool],
"Summarize the content in page number 2",
verbose=True)

=== Calling Function ===
Calling function: vector_tool_BERT_arxiv with args: {"query": "summarize content", "page_numbers": ["2"]}
=== Function Output ===
The content discusses the use of RAG models for knowledge-intensive generation tasks, such as MS-MARCO and Jeopardy question generation, showing that the models produce more factual, specific, and diverse responses compared to a BART baseline. The models also perform well in FEVER fact verification, achieving results close to state-of-the-art pipeline models. Additionally, the models demonstrate the ability to update their knowledge as the world changes by replacing the non-parametric memory.

用于生成所有文档的向量存储工具和摘要工具的辅助函数

def get_doc_tools(file_path:str,name:str)->str:
'''
从文档中获取向量查询和摘要查询工具
'''

documents = SimpleDirectoryReader(input_files = [file_path]).load_data()
print(f"length of nodes")
splitter = SentenceSplitter(chunk_size=1024,chunk_overlap=100)
nodes = splitter.get_nodes_from_documents(documents)
print(f"Length of nodes : {len(nodes)}")

vector_index = VectorStoreIndex(nodes,storage_context=storage_context)
vector_index.storage_context.vector_store.persist(persist_path="/content/chroma_db")


def vector_query(query:str,page_numbers:Optional[List[str]]=None)->str:
'''
在索引上执行向量搜索
query(str): 需要嵌入的查询字符串
page_numbers(List[str]): 要检索的页码列表,
如果要对所有页面执行向量搜索,则留空
'''
page_numbers = page_numbers or []
metadata_dict = [{"key":'page_label',"value":p} for p in page_numbers]

query_engine = vector_index.as_query_engine(similarity_top_k =2,
filters = MetadataFilters.from_dicts(metadata_dict,
condition=FilterCondition.OR)
)

response = query_engine.query(query)
return response


vector_query_tool = FunctionTool.from_defaults(name=f"vector_tool_{name}",
fn=vector_query)

summary_index = SummaryIndex(nodes)
summary_query_engine = summary_index.as_query_engine(response_mode="tree_summarize",
se_async=True,)
summary_query_tool = QueryEngineTool.from_defaults(name=f"summary_tool_{name}",
query_engine=summary_query_engine,
description=("Use ONLY IF you want to get a holistic summary of the documents."
"DO NOT USE if you have specified questions over the documents."))
return vector_query_tool,summary_query_tool

准备一个包含指定文档名称的输入列表

import os
root_path = "/content/data"
file_name = []
file_path = []
for files in os.listdir(root_path):
if file.endswith(".pdf"):
file_name.append(files.split(".")[0])
file_path.append(os.path.join(root_path,file))

print(file_name)
print(file_path)
['self_rag_arxiv', 'crag_arxiv', 'RAG_arxiv', '', 'BERT_arxiv']
['/content/data/BERT_arxiv.pdf',
/content/data/BERT_arxiv.pdf',
/content/data/BERT_arxiv.pdf',
/content/data/BERT_arxiv.pdf',
/content/data/BERT_arxiv.pdf']

注意:FunctionTool 期望工具名称的字符串匹配模式^[a-zA-Z0--9_-]+$

为每个文档生成向量工具和摘要工具

papers_to_tools_dict = {}
for name,filename in zip(file_name,file_path):
vector_query_tool,summary_query_tool = get_doc_tools(filename,name)
papers_to_tools_dict[name] = [vector_query_tool,summary_query_tool]
length of nodes
Length of nodes : 28
length of nodes
Length of nodes : 28
length of nodes
Length of nodes : 28
length of nodes
Length of nodes : 28
length of nodes
Length of nodes : 28

将工具放入一个扁平列表中

initial_tools = [t for f in file_name for t in papers_to_tools_dict[f]]
initial_tools

将太多工具选择塞入 LLM 提示会导致以下问题:

这里的解决方案是在工具级别执行 RAG。为了执行此操作,我们将使用 Llama-Index 的ObjectIndex类。

ObjectIndex类允许对任意 Python 对象进行索引。因此,它非常灵活,适用于各种用例。例如:

VectorStoreIndex是 LlamaIndex 的一个关键组件,它有助于数据的存储和检索。它的工作原理是:

from llama_index.core import VectorStoreIndex
from llama_index.core.objects import ObjectIndex

obj_index = ObjectIndex.from_objects(initial_tools,index_cls=VectorStoreIndex)

将 ObjectIndex 设置为检索器

obj_retriever = obj_index.as_retriever(similarity_top_k=2)
tools = obj_retriever.retrieve("compare and contrast the papers self rag and corrective rag")

print(tools[0].metadata)
print(tools[1].metadata)
ToolMetadata(description='Use ONLY IF you want to get a holistic summary of the documents.DO NOT USE if you have specified questions over the documents.', name='summary_tool_self_rag_arxiv', fn_schema=<class 'llama_index.core.tools.types.DefaultToolFnSchema'>, return_direct=False)

ToolMetadata(description='vector_tool_self_rag_arxiv(query: str, page_numbers: Optional[List[str]] = None) -> str\n\nperform vector search over index on\nquery(str): query string needs to be embedded\npage_numbers(List[str]): list of page numbers to be retrieved,\nleave blank if we want to perform a vector search over all pages\n', name='vector_tool_self_rag_arxiv', fn_schema=<class 'pydantic.v1.main.vector_tool_self_rag_arxiv'>, return_direct=False)

设置 RAG 智能体

from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner

agent_worker = FunctionCallingAgentWorker.from_tools(tool_retriever=obj_retriever,
llm=llm,
system_prompt="""You are an agent designed to answer queries over a set of given papers.
Please always use the tools provided to answer a question.Do not rely on prior knowledge.""",
verbose=True)
agent = AgentRunner(agent_worker)

询问问题 1

response = agent.query("Compare and contrast self rag and crag.")
print(str(response))

Added user message to memory: Compare and contrast self rag and crag.
=== LLM Response ===
Sure, I'd be happy to help you understand the differences between Self RAG and CRAG, based on the functions provided to me.

Self RAG (Retrieval-Augmented Generation) is a method where the model generates a holistic summary of the documents provided as input. It'
s important to note that this method should only be used if you want a general summary of the documents, and not if you have specific questions over the documents.

On the other hand, CRAG (Contrastive Retrieval-Augmented Generation) is also a method for generating a holistic summary of the documents. The key difference between CRAG and Self RAG is not explicitly clear from the functions provided. However, the name suggests that CRAG might use a contrastive approach in its retrieval process, which could potentially lead to a summary that highlights the differences and similarities between the documents more effectively.

Again, it's crucial to remember that both of these methods should only be used for a holistic summary, and not for answering specific questions over the documents.

询问问题 2

response = agent.query("Summarize the paper corrective RAG.")
print(str(response))

Added user message to memory: Summarize the paper corrective RAG.
=== Calling Function ===
Calling function: summary_tool_RAG_arxiv with args: {"input": "corrective RAG"}
=== Function Output ===
The corrective RAG approach is a method used to address issues or errors in a system by categorizing them into three levels: Red, Amber, and Green. Red signifies critical problems that need immediate attention, Amber indicates issues that require monitoring or action in the near future, and Green represents no significant concerns. This approach helps prioritize and manage corrective actions effectively based on the severity of the identified issues.
=== LLM Response ===
The corrective RAG approach categorizes issues into Red, Amber, and Green levels to prioritize and manage corrective actions effectively based on severity. Red signifies critical problems needing immediate attention, Amber requires monitoring or action soon, and Green indicates no significant concerns.
assistant: The corrective RAG approach categorizes issues into Red, Amber, and Green levels to prioritize and manage corrective actions effectively based on severity. Red signifies critical problems needing immediate attention, Amber requires monitoring or action soon, and Green indicates no significant concerns.

结论

与适用于跨少量文档进行简单查询的标准 RAG 管道不同,这种智能方法根据初始发现进行调整以增强进一步的数据检索。我们在此开发了一个自主研究智能体,增强了我们全面参与和分析数据的能力。







欢迎光临 链载Ai (http://www.lianzai.com/) Powered by Discuz! X3.5