RAG fails on global questions directed at an entire text corpus, such as “What are the main themes in the dataset?”, since this is inherently a query- focused summarization (QFS) task, rather than an explicit retrieval task.
GraphRAG 的大致思路是先把用户上传的文档(目前仅支持 txt)生成知识图谱,并用 community detection 算法进行实例等级、关系分类, 用 LLM对关系密切的实体进行总结
最后根据用户问题, 对这些总结进行进一步摘要来回答用户问题, 这个更多是 Global Search/Answer 来做, 适合概括性的问题,比如"数据最主要讲了什么"
“
Our approach uses an LLM to build a graph-based text index in two stages: first to derive an entity knowledge graph from the source documents, then to pre-generate community summaries for all groups of closely-related entities
first using each community summary to answer the query independently and in parallel, then summarizing all relevant partial answers into a final global answer.
community hierarchy
社区层级是指在知识图谱中,实体之间形成的层次化社区结构。这种结构反映了实体之间的关系和组织方式,通常包括:
顶层社区:代表最广泛的类别或主题root communities at level 0
中层社区:更具体的子类别或子主题sub-communities at level 1
底层社区:最细化的分类,通常直接包含具体实体