喂饭教程！全网首发Neo4J可视化GraphRAG索引 - 链载Ai

GraphRAG通过结合知识图谱，增加RAG的全局检索能力。今天我将讲解如何使用Neo4J可视化GraphRAG索引的结果，以便进一步的处理、分析。本篇仍然以小说《仙逆》提取的实体为例，一图胜千言。本文分为4小节，安装neo4j、导入GraphRAG索引文件、Neo4J可视化分析和总结，所有坑都已经帮你趟过啦，放心食用。

Neo4j^[1]是由 Neo4j Inc. 开发的图数据库管理系统，是图数据库技术领域的领导者——强大的原生图存储、数据科学和分析，具备企业级的安全性。无约束地扩展您的事务和分析工作负载。已下载超过1.6亿次。Neo4j 存储的数据元素包括节点、连接它们的边以及节点和边的属性。

1. 安装Neo4j

Neo4j支持使用云端服务和本地社区开源版本，使用如下Docker命令启动Neo4J实例。

dockerrun\
-p7474:7474-p7687:7687\
--nameneo4j-apoc\
-eNEO4J_apoc_export_file_enabled=true\
-eNEO4J_apoc_import_file_enabled=true\
-eNEO4J_apoc_import_file_use__neo4j__config=true\
-eNEO4J_PLUGINS=\[\"apoc\"\]\
neo4j:5.21.2

浏览器打开http://localhost:7474/，然后输入默认用户名neo4j，默认密码neo4j即可登录，登录之后要求重设密码。

接下来，安装neo4j的依赖包

pipinstall--quietpandasneo4j-rust-ext

2. 导入GraphRAG的索引结果

为了更好地支持中文提取，本次采用deepseeker^[2]的deep-seek-chat模型（为啥不用qwen2？因为我的免费额度使用完了）。注册之后免费500万Token，索引一次通过，支持128K上下文，最大输出Tokens为4096。所以设置LLM的时候，务必把max_tokens设置为4096，未明确说明TPM和RPM，根据平台符合自动调整。

导入依赖库

importpandasaspd
fromneo4jimportGraphDatabase
importtime

联结本地Neo4j实例

NEO4J_URI="neo4j://localhost"#orneo4j+s://xxxx.databases.neo4j.io
NEO4J_USERNAME="neo4j"
NEO4J_PASSWORD="password"#你自己的密码
NEO4J_DATABASE="neo4j"

#CreateaNeo4jdriver
driver=GraphDatabase.driver(NEO4J_URI,auth=(NEO4J_USERNAME,NEO4J_PASSWORD))

GRAPHRAG_FOLDER="./output/20240716-192226/artifacts"

创建Neo4j索引

在Neo4j中，索引仅用于查找图查询的起始点，例如快速查找两个节点以进行连接。约束用于避免重复，主要在实体类型的id上创建。我们使用带有两个下划线的类型作为标记，以区分它们与实际的实体类型。

statements="""
createconstraintchunk_idifnotexistsfor(c:__Chunk__)requirec.idisunique;
createconstraintdocument_idifnotexistsfor(d:__Document__)required.idisunique;
createconstraintentity_idifnotexistsfor(c:__Community__)requirec.communityisunique;
createconstraintentity_idifnotexistsfor(e:__Entity__)requiree.idisunique;
createconstraintentity_titleifnotexistsfor(e:__Entity__)requiree.nameisunique;
createconstraintentity_titleifnotexistsfor(e:__Covariate__)requiree.titleisunique;
createconstraintrelated_idifnotexistsfor()-[rel:RELATED]->()requirerel.idisunique;
""".split(";")

forstatementinstatements:
iflen((statementor"").strip())>0:
print(statement)
driver.execute_query(statement)

创建批量导入函数

defbatched_import(statement,df,batch_size=1000):
"""
ImportadataframeintoNeo4jusingabatchedapproach.
Parameters:statementistheCypherquerytoexecute,dfisthedataframetoimport,andbatch_sizeisthenumberofrowstoimportineachbatch.
"""
total=len(df)
start_s=time.time()
forstartinrange(0,total,batch_size):
batch=df.iloc[start:min(start+batch_size,total)]
result=driver.execute_query("UNWIND$rowsASvalue"+statement,
rows=batch.to_dict('records'),
database_=NEO4J_DATABASE)
print(result.summary.counters)
print(f'{total}rowsin{time.time()-start_s}s.')
returntotal

导入文档

doc_df=pd.read_parquet(f'{GRAPHRAG_FOLDER}/create_final_documents.parquet',columns=["id","title"])
doc_df.head(2)

#importdocuments
statement="""
MERGE(d:__Document__{id:value.id})
SETd+=value{.title}
"""

batched_import(statement,doc_df)

导入text units

text_df=pd.read_parquet(f'{GRAPHRAG_FOLDER}/create_final_text_units.parquet',
columns=["id","text","n_tokens","document_ids"])
text_df.head(2)

statement="""
MERGE(c:__Chunk__{id:value.id})
SETc+=value{.text,.n_tokens}
WITHc,value
UNWINDvalue.document_idsASdocument
MATCH(d:__Document__{id:document})
MERGE(c)-[ART_OF]->(d)
"""

batched_import(statement,text_df)

加载实体

entity_df=pd.read_parquet(f'{GRAPHRAG_FOLDER}/create_final_entities.parquet',
columns=["name","type","description","human_readable_id","id","description_embedding",
"text_unit_ids"])
entity_df.head(2)

entity_statement="""
MERGE(e:__Entity__{id:value.id})
SETe+=value{.human_readable_id,.description,name:replace(value.name,'"','')}
WITHe,value
CALLdb.create.setNodeVectorProperty(e,"description_embedding",value.description_embedding)
CALLapoc.create.addLabels(e,casewhencoalesce(value.type,"")=""then[]else[apoc.text.upperCamelCase(replace(value.type,'"',''))]end)yieldnode
UNWINDvalue.text_unit_idsAStext_unit
MATCH(c:__Chunk__{id:text_unit})
MERGE(c)-[:HAS_ENTITY]->(e)
"""

batched_import(entity_statement,entity_df)

导入关系

rel_df=pd.read_parquet(f'{GRAPHRAG_FOLDER}/create_final_relationships.parquet',
columns=["source","target","id","rank","weight","human_readable_id","description",
"text_unit_ids"])
rel_df.head(2)

rel_statement="""
MATCH(source:__Entity__{name:replace(value.source,'"','')})
MATCH(target:__Entity__{name:replace(value.target,'"','')})
//notnecessarytomergeonidasthereisonlyonerelationshipperpair
MERGE(source)-[rel:RELATED{id:value.id}]->(target)
SETrel+=value{.rank,.weight,.human_readable_id,.description,.text_unit_ids}
RETURNcount(*)ascreatedRels
"""

batched_import(rel_statement,rel_df)

导入社区

community_df=pd.read_parquet(f'{GRAPHRAG_FOLDER}/create_final_communities.parquet',
columns=["id","level","title","text_unit_ids","relationship_ids"])

community_df.head(2)

statement="""
MERGE(c:__Community__{community:value.id})
SETc+=value{.level,.title}
/*
UNWINDvalue.text_unit_idsastext_unit_id
MATCH(t:__Chunk__{id:text_unit_id})
MERGE(c)-[:HAS_CHUNK]->(t)
WITHdistinctc,value
*/
WITH*
UNWINDvalue.relationship_idsasrel_id
MATCH(start:__Entity__)-[:RELATED{id:rel_id}]->(end:__Entity__)
MERGE(start)-[:IN_COMMUNITY]->(c)
MERGE(end)-[:IN_COMMUNITY]->(c)
RETURncount(distinctc)ascreatedCommunities
"""

batched_import(statement,community_df)

导入社区报告

community_report_df=pd.read_parquet(f'{GRAPHRAG_FOLDER}/create_final_community_reports.parquet',
columns=["id","community","level","title","summary","findings","rank",
"rank_explanation","full_content"])
community_report_df.head(2)
#importcommunities
community_statement="""MATCH(c:__Community__{community:value.community})
SETc+=value{.level,.title,.rank,.rank_explanation,.full_content,.summary}
WITHc,value
UNWINDrange(0,size(value.findings)-1)ASfinding_idx
WITHc,value,finding_idx,value.findings[finding_idx]asfinding
MERGE(c)-[:HAS_FINDING]->(f:Finding{id:finding_idx})
SETf+=finding"""
batched_import(community_statement,community_report_df)

以上我们导入了文档、TextUnits、实体、关系、社区和社区报告后，打开浏览器后就可可视化分析这些实体关系和社区之间的信息了。Here we go

3. 可视化分析

打开浏览器输入地址http://localhost:7474/browser/。

实体

每个实体可点开，查看进一步的关联关系，王林和铁柱的关系也是一目了然。

社区

社区有很多，基本上是对某一个特定事件进行整合，比如测试事件都关联了哪些人、哪些测试。

地点

点开洞穴可以进一步查看该洞穴关联的实体和人物、文本单元。

4. 总结

本文通过使用Neo4J可视化分析GraphRAG索引结果，让我们能够更为直观的了解整个GraphRAG索引结果，需要完整脚本的同学发送消息neo4j即可领取。

[2]

deepseeker: https://platform.deepseek.com/