Agent能根据任务,自主去探索知识图谱,或任何数据库!
然后找到能支撑解决问题的,直接数据!
不存在模糊相似!更不存在胡编乱造!
sysdm.cpl
neo4jconsole
condacreate-nagentpython=3.10
condaactivateagent
pipinstall-rrequirements.txt
jupyterlab
# 连接你本地的neo4jos.environ["NEO4J_URI"] ="bolt://localhost:7687"os.environ["NEO4J_USERNAME"] ="neo4j"os.environ["NEO4J_PASSWORD"] ="password"graph = Neo4jGraph(refresh_schema=False)graph.query("CREATE CONSTRAINT IF NOT EXISTS FOR (c:Chunk) REQUIRE c.id IS UNIQUE")graph.query("CREATE CONSTRAINT IF NOT EXISTS FOR (c:AtomicFact) REQUIRE c.id IS UNIQUE")graph.query("CREATE CONSTRAINT IF NOT EXISTS FOR (c:KeyElement) REQUIRE c.id IS UNIQUE")graph.query("CREATE CONSTRAINT IF NOT EXISTS FOR (document) REQUIRE d.id IS UNIQUE")
# 定义 PyPDFDirectoryLoader 实例loader = PyPDFDirectoryLoader(path="E:\\neo4j000\\data",glob="**/[!.]*.pdf",silent_errors=False,load_hidden=False,recursive=False,extract_images=False,password=None,mode="page",headers=None,extraction_mode="plain",# extraction_kwargs=None,)# 加载 PDF 文件documents = loader.load()# 打印加载的文档fordocindocuments:print(doc)
# 定义提取知识图谱的关键元素和原子事实construction_system ="""You are now an intelligent assistant tasked with meticulously extracting both key elements andatomic facts from a long text.1. Key Elements: The essential nouns (e.g., characters, times, events, places, numbers), verbs (e.g.,actions), and adjectives (e.g., states, feelings) that are pivotal to the text’s narrative.2. Atomic Facts: The smallest, indivisible facts, presented as concise sentences. These includepropositions, theories, existences, concepts, and implicit elements like logic, causality, eventsequences, interpersonal relationships, timelines, etc.Requirements:#####1. Ensure that all identified key elements are reflected within the corresponding atomic facts.2. You should extract key elements and atomic facts comprehensively, especially those that areimportant and potentially query-worthy and do not leave out details.3. Whenever applicable, replace pronouns with their specific noun counterparts (e.g., change I, He,She to actual names).4. Ensure that the key elements and atomic facts you extract are presented in the same language asthe original text (e.g., English or Chinese)."""construction_human ="""Use the given format to extract information from thefollowing input: {input}"""construction_prompt = ChatPromptTemplate.from_messages([("system",construction_system,),("human",("Use the given format to extract information from the ""following input: {input}"),),])
你现在是一个智能助手,负责从长文本中细致地提取关键元素和原子事实关键元素:对文本叙述至关重要的核心名词(例如人物、时间、事件、地点、数字)、动词(例如动作)和形容词(例如状态、情感)原子事实:最小的、不可分割的事实,以简洁的句子形式呈现。这些包括命题、理论、存在、概念以及隐含的逻辑、因果关系、事件顺序、人际关系、时间线等元素要求:确保所有识别出的关键元素都反映在相应的原子事实中你应该全面提取关键元素和原子事实,特别是那些重要且可能被查询的内容,不要遗漏细节只要适用,用具体的名词替换代词(例如将“我”、“他”、“她”替换为实际名字)确保你提取的关键元素和原子事实与原文使用相同语言(例如英语或中文)
#2k的chunk_sizeasyncdefprocess_document(text,document_name,chunk_size=2000,chunk_overlap=200):start=datetime.now()print(f"Startedextractionat:{start}")text_splitter=TokenTextSplitter(chunk_size=chunk_size,chunk_overlap=chunk_overlap)texts=text_splitter.split_text(text)print(f"Totaltextchunks:{len(texts)}")tasks=[asyncio.create_task(construction_chain.ainvoke({"input":chunk_text}))forindex,chunk_textinenumerate(texts)]results=awaitasyncio.gather(*tasks)print(f"FinishedLLMextractionafter:{datetime.now()-start}")docs=[el.dict()forelinresults]forindex,docinenumerate(docs):doc['chunk_id']=encode_md5(texts[index])doc['chunk_text']=texts[index]doc['index']=indexforafindoc["atomic_facts"]:af["id"]=encode_md5(af["atomic_fact"])#导入块/原子事实/关键元素graph.query(import_query,params={"data":docs,"document_name":document_name})#在块之间创建下一个关系graph.query("""MATCH(c:Chunk)<-[:HAS_CHUNK]-(d
ocument)WHEREd.id=$document_nameWITHcORDERBYc.indexWITHcollect(c)ASnodesUNWINDrange(0,size(nodes)-2)ASindexWITHnodes[index]ASstart,nodes[index+1]ASendMERGE(start)-[:NEXT]->(end)""",params={"document_name":document_name})print(f"Finishedimportat:{datetime.now()-start}")awaitprocess_document(text,"wse2",chunk_size=2000,chunk_overlap=100)
| 欢迎光临 链载Ai (https://www.lianzai.com/) | Powered by Discuz! X3.5 |