|
ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;border-left: none;padding: 1em;border-radius: 8px;color: rgba(0, 0, 0, 0.5);background: rgb(247, 247, 247);margin: 0px 8px 2em;">
ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;margin: 1.5em 8px;letter-spacing: 0.1em;color: rgb(63, 63, 63);">上篇文章中我们对AI+知识图谱进行了极简入门:带领大家使用LangChain实现了一个知识图谱的创建和查询。但是当时的创建是直接写死的实例数据,仅仅是展示功能,并没有实际意义。本文我们来介绍如何使用AI,将自己的知识库自动转换为知识图谱。ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;border-left: none;padding: 1em;border-radius: 8px;color: rgba(0, 0, 0, 0.5);background: rgb(247, 247, 247);margin: 2em 8px;">代码参考:https://colab.research.google.com/drive/1OpoLyKAWTVpkhy0VgVduprYypIFTSIrL#scrollTo=TtlKi-4r8grL ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;font-size: 1.2em;font-weight: bold;display: table;margin: 2em auto 1em;padding-right: 1em;padding-left: 1em;border-bottom: 2px solid rgb(15, 76, 129);color: rgb(63, 63, 63);">0. 实现效果ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;margin: 1.5em 8px;letter-spacing: 0.1em;color: rgb(63, 63, 63);">先来看下最终的实现效果:输入知识,大模型自动将知识中的实体、关系、属性等提取出来,并自动生成知识图谱。为了方便查看,代码中还对创建的知识谱图进行了可视化展示。ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;margin: 1.5em 8px;letter-spacing: 0.1em;color: rgb(63, 63, 63);">(1)对于某个知识提取出的知识图谱结构:ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;margin: 1.5em 8px;color: rgb(63, 63, 63);"> ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;margin: 1.5em 8px;letter-spacing: 0.1em;color: rgb(63, 63, 63);">(2)可视化知识谱图的展示:ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;margin: 1.5em 8px;color: rgb(63, 63, 63);"> ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;font-size: 1.2em;font-weight: bold;display: table;margin: 2em auto 1em;padding-right: 1em;padding-left: 1em;border-bottom: 2px solid rgb(15, 76, 129);color: rgb(63, 63, 63);">1. 代码实现过程1.1 知识图谱结构的识别将知识转化为知识图谱,最重要的步骤是将知识中的实体、关系、属性等提取出来。这也是代码的主要部分。这部分是通过Prompt来实现的。 (1)Prompt部分 #Prompttemplateforknowledgetripleextraction _DEFAULT_KNOWLEDGE_TRIPLE_EXTRACTION_TEMPLATE=( "Youareanetworkedintelligencehelpingahumantrackknowledgetriples" "aboutallrelevantpeople,things,concepts,etc.andintegrating" "themwithyourknowledgestoredwithinyourweights" "aswellasthatstoredinaknowledgegraph." "Extractalloftheknowledgetriplesfromthetext." "Aknowledgetripleisaclausethatcontainsasubject,apredicate," "andanobject.Thesubjectistheentitybeingdescribed," "thepredicateisthepropertyofthesubjectthatisbeing" "described,andtheobjectisthevalueoftheproperty.\n\n" "EXAMPLE\n" "It'sastateintheUS.It'salsothenumber1producerofgoldintheUS.\n\n" f"Output Nevada,isa,state){KG_TRIPLE_DELIMITER}(Nevada,isin,US)" f"{KG_TRIPLE_DELIMITER}(Nevada,isthenumber1producerof,gold)\n" "ENDOFEXAMPLE\n\n" "EXAMPLE\n" "I'mgoingtothestore.\n\n" "Output:NONE\n" "ENDOFEXAMPLE\n\n" "EXAMPLE\n" "Ohhuh.IknowDescarteslikestodriveantiquescootersandplaythemandolin.\n" f"Output Descartes,likestodrive,antiquescooters){KG_TRIPLE_DELIMITER}(Descartes,plays,mandolin)\n" "ENDOFEXAMPLE\n\n" "EXAMPLE\n" "{text}" "Output:" )
这个Prompt主要的任务是让大模型从语句中提取出知识三元组,即实体、关系、属性。看着有点像识别句子中的主谓宾。Prompt中给了几个示例,Few-shot的方式可以让大模型更好地理解用户的需求。 (2)调用大模型的基本流程 KNOWLEDGE_TRIPLE_EXTRACTION_PROMPT=PromptTemplate( input_variables=["text"], template=_DEFAULT_KNOWLEDGE_TRIPLE_EXTRACTION_TEMPLATE, )
llm=ChatOpenAI(temperature=0.9)
#CreateanLLMChainusingtheknowledgetripleextractionprompt chain=LLMChain(llm=llm,prompt=KNOWLEDGE_TRIPLE_EXTRACTION_PROMPT)
#Runthechainwiththespecifiedtext text="ThecityofParisisthecapitalandmostpopulouscityofFrance.TheEiffelTowerisafamouslandmarkinParis." triples=chain.invoke( {'text':text} ).get('text')
(3)结果的解析 defparse_triples(response,delimiter=KG_TRIPLE_DELIMITER): ifnotresponse: return[] returnresponse.split(delimiter)
triples_list=parse_triples(triples)
pprint(triples_list)
(4)执行结果示例:  1.2 知识图谱的可视化示例代码中使用gradio框架进行了可视化界面的搭建。可视化不是本文重点,这里就不详细介绍了。可以看完整代码里的实现。大体是使用了 pyvis 和 networkx 来使用前面提取的三元组进行图结构的构建。 运行之后,打开提示中的链接,就可以看到可视化界面了。  这里有小小的疑问:为什么要同时使用pyvis和networkx?看代码中是先用三元组构建了networkx的图结构,然后将networkx结构转化成了pyvis结构,然后再使用pyvis进行可视化。这个过程有必要?不能直接利用三元组构建pyvis结构吗?期待各位大佬的解答!
2. 完整代码下面将可直接运行的完整代码奉上(当然,缺依赖库的话还是要自己装一装的了): fromlangchain.promptsimportPromptTemplate fromlangchain_openaiimportChatOpenAI fromlangchain.chainsimportLLMChain fromlangchain.graphs.networkx_graphimportKG_TRIPLE_DELIMITER frompprintimportpprint
#Prompttemplateforknowledgetripleextraction _DEFAULT_KNOWLEDGE_TRIPLE_EXTRACTION_TEMPLATE=( "Youareanetworkedintelligencehelpingahumantrackknowledgetriples" "aboutallrelevantpeople,things,concepts,etc.andintegrating" "themwithyourknowledgestoredwithinyourweights" "aswellasthatstoredinaknowledgegraph." "Extractalloftheknowledgetriplesfromthetext." "Aknowledgetripleisaclausethatcontainsasubject,apredicate," "andanobject.Thesubjectistheentitybeingdescribed," "thepredicateisthepropertyofthesubjectthatisbeing" "described,andtheobjectisthevalueoftheproperty.\n\n" "EXAMPLE\n" "It'sastateintheUS.It'salsothenumber1producerofgoldintheUS.\n\n" f"Output Nevada,isa,state){KG_TRIPLE_DELIMITER}(Nevada,isin,US)" f"{KG_TRIPLE_DELIMITER}(Nevada,isthenumber1producerof,gold)\n" "ENDOFEXAMPLE\n\n" "EXAMPLE\n" "I'mgoingtothestore.\n\n" "Output:NONE\n" "ENDOFEXAMPLE\n\n" "EXAMPLE\n" "Ohhuh.IknowDescarteslikestodriveantiquescootersandplaythemandolin.\n" f"Output Descartes,likestodrive,antiquescooters){KG_TRIPLE_DELIMITER}(Descartes,plays,mandolin)\n" "ENDOFEXAMPLE\n\n" "EXAMPLE\n" "{text}" "Output:" )
KNOWLEDGE_TRIPLE_EXTRACTION_PROMPT=PromptTemplate( input_variables=["text"], template=_DEFAULT_KNOWLEDGE_TRIPLE_EXTRACTION_TEMPLATE, )
llm=ChatOpenAI(temperature=0.9)
#CreateanLLMChainusingtheknowledgetripleextractionprompt chain=LLMChain(llm=llm,prompt=KNOWLEDGE_TRIPLE_EXTRACTION_PROMPT)
#Runthechainwiththespecifiedtext text="ThecityofParisisthecapitalandmostpopulouscityofFrance.TheEiffelTowerisafamouslandmarkinParis." triples=chain.invoke( {'text':text} ).get('text')
pprint(triples)
defparse_triples(response,delimiter=KG_TRIPLE_DELIMITER): ifnotresponse: return[] returnresponse.split(delimiter)
triples_list=parse_triples(triples)
pprint(triples_list)
frompyvis.networkimportNetwork importnetworkxasnx
defcreate_graph_from_triplets(triplets): G=nx.DiGraph() fortripletintriplets: subject,predicate,obj=triplet.strip().split(',') G.add_edge(subject.strip(),obj.strip(),label=predicate.strip()) returnG
defnx_to_pyvis(networkx_graph): pyvis_graph=Network(notebook=True,cdn_resources='remote') fornodeinnetworkx_graph.nodes(): pyvis_graph.add_node(node) foredgeinnetworkx_graph.edges(data=True): pyvis_graph.add_edge(edge[0],edge[1],label=edge[2]["label"]) returnpyvis_graph
defgenerateGraph(): triplets=[t.strip()fortintriples_listift.strip()] graph=create_graph_from_triplets(triplets) pyvis_network=nx_to_pyvis(graph)
pyvis_network.toggle_hide_edges_on_drag(True) pyvis_network.toggle_physics(False) pyvis_network.set_edge_smooth('discrete')
html=pyvis_network.generate_html() html=html.replace("'","\"")
returnf"""<iframestyle="width:100%;height:600px;margin:0auto"name="result"allow="midi;geolocation;microphone;camera; display-capture;encrypted-media;"sandbox="allow-modalsallow-forms allow-scriptsallow-same-originallow-popups allow-top-navigation-by-user-activationallow-downloads"allowfullscreen="" allowpaymentrequest=""frameborder="0"srcdoc='{html}'></iframe>"""
importgradioasgr
demo=gr.Interface( generateGraph, inputs=None, outputs='html', title="KnowledgeGraph", allow_flagging='never', live=True, )
demo.launch( height=800, width="100%" )
3. 可能遇到的坑如果你使用的是参考链接中的原代码,则很可能会遇到下面的问题。
(1)报错:module gradio has no attribute outputs. gradio版本4.16 解决:outputs=gr.outputs.HTML改为outputs='html' 4. 总结本文我们主要是学习了如何利用AI将知识转化为知识图谱的结构。其中最主要的,就是从知识中提取出三元组,这就强依赖Prompt和大模型的能力了。然后,锦上添花的,代码实现了知识图谱结构的可视化。尽管简单,但思路绝对值得借鉴。
|