LlamaIndex结合Ragflow，打造高性能大模型RAG应用

显示全部楼层

LlamaIndex 与 Ragflow 携手：打造大语言模型应用的超强组合拳。
LlamaIndex和Ragflow是两款开源工具，给开发者们带来了极大便利。LlamaIndex作为一款数据框架，能够轻松实现大语言模型与各类外部数据源的连接，无论是结构化数据（如SQL、NoSQL数据库）、非结构化数据（像文档、网页），还是私有数据（通过API获取），都能与之高效对接。而Ragflow作为工作流编排工具，专注于管理复杂的大语言模型管道执行流程，确保整个处理过程有条不紊地进行。

二者相辅相成，共同为构建强大且具备高扩展性的大语言模型应用程序提供了全方位的解决方案，助力开发者在该领域更高效地创新与实践。

1 定义

1.1 LlamaIndex

LlamaIndex让开发者能够将大语言模型与多种外部数据源连接，这些数据源包括结构化数据（SQL数据库、非关系型数据库）、非结构化数据（文档、网页）以及私有数据（API）。借助它，开发者可构建能广泛获取信息并推理的大语言模型应用。

LlamaIndex有诸多特性：

便捷数据连接器：自带预构建数据连接器库，适配常见数据源。对接新数据源时，开发者无需编写自定义代码。
高效数据索引：可对外部数据索引，在大型数据集里能快速搜索、检索信息。
智能问答功能：能基于外部数据源回答问题，方便开发者打造针对特定主题或文档的问答应用。

1.2 Ragflow

Ragflow 作为一款工作流编排工具，能够对复杂的大语言模型管道执行过程进行有效管理。凭借这一特性，为构建具备多任务执行能力的大语言模型应用程序提供了有力支撑。这些任务包括：

数据检索：Ragflow可以从外部数据源检索数据。
数据处理：Ragflow能够对数据进行处理，例如清洗、转换和汇总数据。
大语言模型推理：Ragflow可以执行大语言模型推理任务。
输出生成：Ragflow能够以多种格式生成输出，如文本、表格或图表。

1.3 LlamaIndex与Ragflow协同工作

LlamaIndex和Ragflow可以协同使用，以构建强大的大语言模型应用程序。

LlamaIndex负责数据交互，连接大语言模型与各类数据源，还能索引和查询数据，拓宽模型信息获取渠道。Ragflow专注工作流程编排，管理复杂的大语言模型管道执行。

二者协同，让开发多功能的大语言模型应用成为可能。这些应用可实现问答、文本生成、数据分析等任务，满足不同场景需求，助力大语言模型广泛应用。

2 代码实现

接下来，我们分步骤进行LlamaIndex与Ragflow的代码实现：

步骤一：安装库、初始化API密钥并下载数据

pip install -U llama-index

# 初始化API密钥
importos
os.environ["OPENAI_API_KEY"] ="sk-proj-..."

# 下载数据
!mkdir -p data
!wget --user-Agent"Mozilla""https://arxiv.org/pdf/2307.09288.pdf"-O"data/llama2.pdf"

步骤二：工作流事件

fromllama_index.core.workflowimportEvent
fromllama_index.core.schemaimportNodeWithScore


classRetrieverEvent(Event):
 """运行检索的结果"""
  nodes: list[NodeWithScore]


classRerankEvent(Event):
 """对检索到的节点进行重新排序的结果"""
  nodes: list[NodeWithScore]

步骤三：完整工作流

fromllama_index.coreimportSimpleDirectoryReader, VectorStoreIndex
fromllama_index.core.response_synthesizersimportCompactAndRefine
fromllama_index.core.postprocessor.llm_rerankimportLLMRerank
fromllama_index.core.workflowimport(
  Context,
  Workflow,
  StartEvent,
  StopEvent,
  step,
)
fromllama_index.llms.openaiimportOpenAI
fromllama_index.embeddings.openaiimportOpenAIEmbedding


classRAGWorkflow(Workflow):
  @step(pass_context=True)
 asyncdefingest(self, ctx: Context, ev: StartEvent)-> StopEvent |None:
   """摄取文档的入口点，由包含`dirname`的StartEvent触发。"""
    dirname = ev.get("dirname")
   ifnotdirname:
     returnNone

    documents = SimpleDirectoryReader(dirname).load_data()
    ctx.data["index"] = VectorStoreIndex.from_documents(
      documents=documents,
      embed_model=OpenAIEmbedding(model_name="text-embedding-3-small"),
    )
   returnStopEvent(result=f"Indexed{len(documents)}documents.")

  @step(pass_context=True)
 asyncdefretrieve(
    self, ctx: Context, ev: StartEvent
  )-> RetrieverEvent |None:
   """RAG的入口点，由包含`query`的StartEvent触发。"""
    query = ev.get("query")
   ifnotquery:
     returnNone

    print(f"Query the database with:{query}")

   # 将查询存储在全局上下文中
    ctx.data["query"] = query

   # 从全局上下文中获取索引
    index = ctx.data.get("index")
   ifindexisNone:
      print("Index is empty, load some documents before querying!")
     returnNone

    retriever = index.as_retriever(similarity_top_k=2)
    nodes = retriever.retrieve(query)
    print(f"Retrieved{len(nodes)}nodes.")
   returnRetrieverEvent(nodes=nodes)

  @step(pass_context=True)
 asyncdefrerank(self, ctx: Context, ev: RetrieverEvent)-> RerankEvent:
   # 对节点重新排序
    ranker = LLMRerank(
      choice_batch_size=5, top_n=3, llm=OpenAI(model="gpt-4o-mini")
    )
    print(ctx.data.get("query"), flush=True)
    new_nodes = ranker.postprocess_nodes(
      ev.nodes, query_str=ctx.data.get("query")
    )
    print(f"Reranked nodes to{len(new_nodes)}")
   returnRerankEvent(nodes=new_nodes)

  @step(pass_context=True)
 asyncdefsynthesize(self, ctx: Context, ev: RerankEvent)-> StopEvent:
   """使用重新排序后的节点返回流式响应。"""
    llm = OpenAI(model="gpt-4o-mini")
    summarizer = CompactAndRefine(llm=llm, streaming=True, verbose=True)
    query = ctx.data.get("query")

    response =awaitsummarizer.asynthesize(query, nodes=ev.nodes)
   returnStopEvent(result=response)

步骤四：运行工作流

w = RAGWorkflow()
# 摄取文档
awaitw.run(dirname="data")
# 运行查询
result =awaitw.run(query="How was Llama2 trained?")
asyncforchunkinresult.async_response_gen():
  print(chunk, end="", flush=True)

Query the database with: How was Llama2 trained?
Retrieved 2 nodes.
Llama 2 was trained through a multi-step process that began with pretraining using publicly available online sources. This was followed by the creation of an initial version of Llama 2-Chat through supervised fine-tuning. The model wastheniteratively refined using Reinforcement Learning with Human Feedback (RLHF) methodologies,whichincluded techniques like rejection sampling and Proximal Policy Optimization (PPO).

During pretraining, the model utilized an optimized auto-regressive transformer architecture, incorporating robust data cleaning, updated data mixes, and training on a significantly larger dataset of 2 trillion tokens. The training process also involved increased context length and the use of grouped-query attention (GQA) to enhance inference scalability.

The training employed the AdamW optimizer with specific hyperparameters, a cosine learning rate schedule, and gradient clipping. The models were pretrained on Meta's Research SuperCluster and internal production clusters, utilizing NVIDIA A100 GPUs for the training process.

3 结语

在大语言模型（LLM）应用开发的技术版图中，LlamaIndex 与 Ragflow 占据着重要地位。这两款开源工具优势独特，能助力构建基于大语言模型的应用程序。

LlamaIndex 可连接数据源并处理数据，Ragflow 能高效编排工作流，二者协同为打造强大且可扩展的大语言模型应用提供全面方案。

对相关技术人员来说，探索 LlamaIndex 和 Ragflow，用其构建应用的方法，有助于紧跟技术趋势、提升开发能力。希望大家在实践中挖掘其潜力，推动大语言模型应用创新发展。