开源RAG又添新军！港大开源多模态RAG神器，多文档格式统一解析、知识图谱索引与混合检索！ - 链载Ai

ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;font-size: 15px;letter-spacing: 0.1em;color: rgb(63, 63, 63);">在AI驱动的信息检索领域，传统RAG（检索增强生成）系统通常局限于文本处理，难以应对包含文本、图像、表格和公式的复杂文档。

ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;font-size: 15px;letter-spacing: 0.1em;color: rgb(63, 63, 63);">ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;font-size: inherit;color: rgb(66, 166, 100);">RAG-Anything是由香港大学数据智能实验室开发的一款开源的多模态RAG系统，能够提供从文档摄取到智能查询的端到端解决方案。

ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;font-size: 15px;letter-spacing: 0.1em;color: rgb(63, 63, 63);">与传统RAG不同，它通过多模态知识图谱、灵活的解析架构和混合检索机制，提供上下文感知的高精度查询结果，显著提升复杂文档处理能力。

ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;font-size: 15px;letter-spacing: 0.1em;color: rgb(63, 63, 63);">文本、图像、表格、公式等多模态内容全覆盖，提供了真正端到端的一体化处理能力。

ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;color: rgb(66, 166, 100);">核心优势

ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;font-size: 15px;text-indent: -1em;display: block;margin: 0.2em 8px;color: rgb(63, 63, 63);">
•ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;font-size: inherit;color: rgb(66, 166, 100);">端到端多模态流水线：从文档解析到多模态智能查询，提供一体化工作流程。
ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;font-size: 15px;text-indent: -1em;display: block;margin: 0.2em 8px;color: rgb(63, 63, 63);">
•多格式文档支持：兼容PDF、Office文档（DOC/DOCX、PPT/PPTX、XLS/XLSX）、图像（JPG、PNG等）和文本文件（TXT、MD）。
•多模态内容分析引擎：针对图像、表格、公式和通用文本内容部署专门的处理器，确保各类内容的精准解析。
•知识图谱索引：自动提取实体和跨模态关系，构建语义连接网络。
•灵活的处理架构：支持MinerU智能解析模式和直接多模态内容插入模式，适配多样化场景。
•跨模态检索机制：实现跨文本和多模态内容的智能检索，提供精准的信息定位和匹配能力。

算法原理与架构

RAG-Anything采用分层架构，通过多阶段流水线扩展传统RAG，处理异构内容。其工作流程包括文档解析、内容分析、知识图谱构建和智能检索。

1、文档解析阶段

目标：从多种格式文档中提取和结构化多模态元素（文本、图像、表格、公式）。

核心组件：

•结构化提取引擎：集成MinerU，实现高精度文档结构识别与内容提取。
•自适应内容分解：智能分离文本、图像、表格和公式，保留语义关联（如图像与说明文字）。
•多格式兼容：支持PDF、Office文档、图像和文本文件，输出标准化多模态内容。

2、多模态内容理解

目标：通过专用流水线并行处理异构内容，确保高效和完整性。

核心组件：

•内容分类与路由：自动识别内容类型并分配到对应处理通道。
•并发多流水线：并行处理文本和多模态内容，最大化吞吐量。
•文档层次提取：保留章节、表格标题等结构关系，维护语义完整性。

3、多模态分析引擎

目标：为不同模态内容部署专用处理器，确保精准解析。

核心组件：

•视觉内容分析器：基于CLIP类视觉模型，生成上下文感知的图像标题，提取空间关系。
•结构化数据解释器：对表格应用统计模式识别，分析趋势和语义依赖。
•数学表达式解析器：支持LaTeX公式解析，与领域知识库映射。
•可扩展模态处理器：插件架构支持自定义内容类型，动态配置流水线。

4、多模态知识图谱索引

目标：将文档内容转化为结构化语义表示，提升检索效率。

核心功能：

•多模态实体提取：将表格标题、图像对象等转换为知识图谱实体，附带语义标注。
•跨模态关系映射：推理文本与多模态元素间的语义连接（如图像与说明）。
•层次结构保持：通过“归属”关系链维护文档结构。
•加权关系评分：基于语义邻近性和上下文重要性分配分数。

5、模态感知检索

目标：实现精准、上下文感知的多模态检索。

核心机制：

•向量-图谱融合：结合向量相似性搜索（FAISS）和图遍历算法，覆盖语义和结构信息。
•模态感知排序：根据查询的模态偏好自适应调整结果排序。
•关系一致性：确保检索结果保持语义和结构连贯性。

快速使用

RAG-Anything 的安装方式非常简单，支持从PyPI安装（推荐）或者源码部署。

1、从PyPI安装（推荐）

pipinstallraganything

2、从源码安装

gitclonehttps://github.com/HKUDS/RAG-Anything.git
cdRAG-Anything
pip install -e .

然后需要检查MinerU是否安装：

# 验证安装
mineru --version

# 检查是否正确配置
python -c"from raganything import RAGAnything; rag = RAGAnything(); print('✅ MinerU安装正常' if rag.check_mineru_installation() else '❌ MinerU安装有问题')"

模型在首次使用时会自动下载。

Python使用示例

Demo01：端到端文档处理

importasyncio
fromraganythingimportRAGAnything
fromlightrag.llm.openaiimportopenai_complete_if_cache, openai_embed

asyncdefmain():
 # 初始化RAGAnything
  rag = RAGAnything(
    working_dir="./rag_storage",
    llm_model_func=lambdaprompt, system_prompt=None, history_messages=[], **kwargs: openai_complete_if_cache(
     "gpt-4o-mini",
      prompt,
      system_prompt=system_prompt,
      history_messages=history_messages,
      api_key="your-api-key",
      **kwargs,
    ),
    vision_model_func=lambdaprompt, system_prompt=None, history_messages=[], image_data=None, **kwargs: openai_complete_if_cache(
     "gpt-4o",
     "",
      system_prompt=None,
      history_messages=[],
      messages=[
        {"role":"system","content": system_prompt}ifsystem_promptelseNone,
        {"role":"user","content": [
          {"type":"text","text": prompt},
          {"type":"image_url","image_url": {"url":f"data:image/jpeg;base64,{image_data}"}}
        ]}ifimage_dataelse{"role":"user","content": prompt}
      ],
      api_key="your-api-key",
      **kwargs,
    )ifimage_dataelseopenai_complete_if_cache(
     "gpt-4o-mini",
      prompt,
      system_prompt=system_prompt,
      history_messages=history_messages,
      api_key="your-api-key",
      **kwargs,
    ),
    embedding_func=lambdatexts: openai_embed(
      texts,
      model="text-embedding-3-large",
      api_key="your-api-key",
    ),
    embedding_dim=3072,
    max_token_size=8192
  )

 # 处理文档
 awaitrag.process_document_complete(
    file_path="path/to/your/document.pdf",
    output_dir="./output",
    parse_method="auto"
  )

 # 查询处理后的内容
  result =awaitrag.query_with_multimodal(
   "图表中显示的主要发现是什么？",
    mode="hybrid"
  )
 print(result)

if__name__ =="__main__":
  asyncio.run(main())

Demo02：直接多模态内容处理

importasyncio
fromlightragimportLightRAG
fromraganything.modalprocessorsimportImageModalProcessor, TableModalProcessor

asyncdefprocess_multimodal_content():
 # 初始化LightRAG
  rag = LightRAG(
    working_dir="./rag_storage",
   # ... 你的LLM和嵌入配置
  )
 awaitrag.initialize_storages()

 # 处理图像
  image_processor = ImageModalProcessor(
    lightrag=rag,
    modal_caption_func=your_vision_model_func
  )

  image_content = {
   "img_path":"path/to/image.jpg",
   "img_caption": ["图1：实验结果"],
   "img_footnote": ["数据收集于2024年"]
  }

  description, entity_info =awaitimage_processor.process_multimodal_content(
    modal_content=image_content,
    content_type="image",
    file_path="research_paper.pdf",
    entity_name="实验结果图表"
  )

 # 处理表格
  table_processor = TableModalProcessor(
    lightrag=rag,
    modal_caption_func=your_llm_model_func
  )

  table_content = {
   "table_body":"""
    | 方法 | 准确率 | F1分数 |
    |------|--------|--------|
    | RAGAnything | 95.2% | 0.94 |
    | 基准方法 | 87.3% | 0.85 |
    """,
   "table_caption": ["性能对比"],
   "table_footnote": ["测试数据集结果"]
  }

  description, entity_info =awaittable_processor.process_multimodal_content(
    modal_content=table_content,
    content_type="table",
    file_path="research_paper.pdf",
    entity_name="性能结果表格"
  )

if__name__ =="__main__":
  asyncio.run(process_multimodal_content())

Demo03：批量处理

# 处理多个文档
awaitrag.process_folder_complete(
  folder_path="./documents",
  output_dir="./output",
  file_extensions=[".pdf",".docx",".pptx"],
  recursive=True,
  max_workers=4
)

Demo04：自定义多模态处理器

fromraganything.modalprocessorsimportGenericModalProcessor

classCustomModalProcessor(GenericModalProcessor):
 asyncdefprocess_multimodal_content(self, modal_content, content_type, file_path, entity_name):
   # 你的自定义处理逻辑
    enhanced_description =awaitself.analyze_custom_content(modal_content)
    entity_info =self.create_custom_entity(enhanced_description, entity_name)
   returnawaitself._create_entity_and_chunk(enhanced_description, entity_info, file_path)

Demo05：查询选项

# 不同的查询模式
result_hybrid =awaitrag.query_with_multimodal("你的问题", mode="hybrid")
result_local =awaitrag.query_with_multimodal("你的问题", mode="local")
result_global =awaitrag.query_with_multimodal("你的问题", mode="global")

项目目录下也有相应的实际场景演示，examples/ 目录包含完整的使用示例：

raganything_example.py：基于MinerU的端到端文档处理
modalprocessors_example.py：直接多模态内容处理
office_document_test.py：Office文档解析测试（无需API密钥）
image_format_test.py：图像格式解析测试（无需API密钥）
text_format_test.py：文本格式解析测试（无需API密钥）

写在最后

在 RAG 系统百花齐放的今天，RAG-Anything是少有的“真正做全”的开源RAG系统之一。

从结构化提取到多模态融合，从问答到检索，它不仅支持多种文档格式，而且能智能分析、构建知识图谱、并在上下文语义层面实现真正的信息理解与调用。