代码参考:https://colab.research.google.com/drive/1OpoLyKAWTVpkhy0VgVduprYypIFTSIrL#scrollTo=TtlKi-4r8grL
ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;font-size: 1.2em;font-weight: bold;display: table;margin: 2em auto 1em;padding-right: 1em;padding-left: 1em;border-bottom: 2px solid rgb(15, 76, 129);color: rgb(63, 63, 63);">0. 实现效果ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;margin: 1.5em 8px;letter-spacing: 0.1em;color: rgb(63, 63, 63);">先来看下最终的实现效果:输入知识,大模型自动将知识中的实体、关系、属性等提取出来,并自动生成知识图谱。为了方便查看,代码中还对创建的知识谱图进行了可视化展示。ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;margin: 1.5em 8px;letter-spacing: 0.1em;color: rgb(63, 63, 63);">(1)对于某个知识提取出的知识图谱结构:将知识转化为知识图谱,最重要的步骤是将知识中的实体、关系、属性等提取出来。这也是代码的主要部分。这部分是通过Prompt来实现的。
(1)Prompt部分
#Prompttemplateforknowledgetripleextraction
_DEFAULT_KNOWLEDGE_TRIPLE_EXTRACTION_TEMPLATE=(
"Youareanetworkedintelligencehelpingahumantrackknowledgetriples"
"aboutallrelevantpeople,things,concepts,etc.andintegrating"
"themwithyourknowledgestoredwithinyourweights"
"aswellasthatstoredinaknowledgegraph."
"Extractalloftheknowledgetriplesfromthetext."
"Aknowledgetripleisaclausethatcontainsasubject,apredicate,"
"andanobject.Thesubjectistheentitybeingdescribed,"
"thepredicateisthepropertyofthesubjectthatisbeing"
"described,andtheobjectisthevalueoftheproperty.\n\n"
"EXAMPLE\n"
"It'sastateintheUS.It'salsothenumber1producerofgoldintheUS.\n\n"
f"Output
Nevada,isa,state){KG_TRIPLE_DELIMITER}(Nevada,isin,US)"
f"{KG_TRIPLE_DELIMITER}(Nevada,isthenumber1producerof,gold)\n"
"ENDOFEXAMPLE\n\n"
"EXAMPLE\n"
"I'mgoingtothestore.\n\n"
"Output:NONE\n"
"ENDOFEXAMPLE\n\n"
"EXAMPLE\n"
"Ohhuh.IknowDescarteslikestodriveantiquescootersandplaythemandolin.\n"
f"Output
Descartes,likestodrive,antiquescooters){KG_TRIPLE_DELIMITER}(Descartes,plays,mandolin)\n"
"ENDOFEXAMPLE\n\n"
"EXAMPLE\n"
"{text}"
"Output:"
)这个Prompt主要的任务是让大模型从语句中提取出知识三元组,即实体、关系、属性。看着有点像识别句子中的主谓宾。Prompt中给了几个示例,Few-shot的方式可以让大模型更好地理解用户的需求。
(2)调用大模型的基本流程
KNOWLEDGE_TRIPLE_EXTRACTION_PROMPT=PromptTemplate(
input_variables=["text"],
template=_DEFAULT_KNOWLEDGE_TRIPLE_EXTRACTION_TEMPLATE,
)
llm=ChatOpenAI(temperature=0.9)
#CreateanLLMChainusingtheknowledgetripleextractionprompt
chain=LLMChain(llm=llm,prompt=KNOWLEDGE_TRIPLE_EXTRACTION_PROMPT)
#Runthechainwiththespecifiedtext
text="ThecityofParisisthecapitalandmostpopulouscityofFrance.TheEiffelTowerisafamouslandmarkinParis."
triples=chain.invoke(
{'text':text}
).get('text')(3)结果的解析
defparse_triples(response,delimiter=KG_TRIPLE_DELIMITER):
ifnotresponse:
return[]
returnresponse.split(delimiter)
triples_list=parse_triples(triples)
pprint(triples_list)(4)执行结果示例:
示例代码中使用gradio框架进行了可视化界面的搭建。可视化不是本文重点,这里就不详细介绍了。可以看完整代码里的实现。大体是使用了 pyvis 和 networkx 来使用前面提取的三元组进行图结构的构建。
运行之后,打开提示中的链接,就可以看到可视化界面了。
这里有小小的疑问:为什么要同时使用pyvis和networkx?看代码中是先用三元组构建了networkx的图结构,然后将networkx结构转化成了pyvis结构,然后再使用pyvis进行可视化。这个过程有必要?不能直接利用三元组构建pyvis结构吗?期待各位大佬的解答!
下面将可直接运行的完整代码奉上(当然,缺依赖库的话还是要自己装一装的了):
fromlangchain.promptsimportPromptTemplate
fromlangchain_openaiimportChatOpenAI
fromlangchain.chainsimportLLMChain
fromlangchain.graphs.networkx_graphimportKG_TRIPLE_DELIMITER
frompprintimportpprint
#Prompttemplateforknowledgetripleextraction
_DEFAULT_KNOWLEDGE_TRIPLE_EXTRACTION_TEMPLATE=(
"Youareanetworkedintelligencehelpingahumantrackknowledgetriples"
"aboutallrelevantpeople,things,concepts,etc.andintegrating"
"themwithyourknowledgestoredwithinyourweights"
"aswellasthatstoredinaknowledgegraph."
"Extractalloftheknowledgetriplesfromthetext."
"Aknowledgetripleisaclausethatcontainsasubject,apredicate,"
"andanobject.Thesubjectistheentitybeingdescribed,"
"thepredicateisthepropertyofthesubjectthatisbeing"
"described,andtheobjectisthevalueoftheproperty.\n\n"
"EXAMPLE\n"
"It'sastateintheUS.It'salsothenumber1producerofgoldintheUS.\n\n"
f"Output
Nevada,isa,state){KG_TRIPLE_DELIMITER}(Nevada,isin,US)"
f"{KG_TRIPLE_DELIMITER}(Nevada,isthenumber1producerof,gold)\n"
"ENDOFEXAMPLE\n\n"
"EXAMPLE\n"
"I'mgoingtothestore.\n\n"
"Output:NONE\n"
"ENDOFEXAMPLE\n\n"
"EXAMPLE\n"
"Ohhuh.IknowDescarteslikestodriveantiquescootersandplaythemandolin.\n"
f"Output
Descartes,likestodrive,antiquescooters){KG_TRIPLE_DELIMITER}(Descartes,plays,mandolin)\n"
"ENDOFEXAMPLE\n\n"
"EXAMPLE\n"
"{text}"
"Output:"
)
KNOWLEDGE_TRIPLE_EXTRACTION_PROMPT=PromptTemplate(
input_variables=["text"],
template=_DEFAULT_KNOWLEDGE_TRIPLE_EXTRACTION_TEMPLATE,
)
llm=ChatOpenAI(temperature=0.9)
#CreateanLLMChainusingtheknowledgetripleextractionprompt
chain=LLMChain(llm=llm,prompt=KNOWLEDGE_TRIPLE_EXTRACTION_PROMPT)
#Runthechainwiththespecifiedtext
text="ThecityofParisisthecapitalandmostpopulouscityofFrance.TheEiffelTowerisafamouslandmarkinParis."
triples=chain.invoke(
{'text':text}
).get('text')
pprint(triples)
defparse_triples(response,delimiter=KG_TRIPLE_DELIMITER):
ifnotresponse:
return[]
returnresponse.split(delimiter)
triples_list=parse_triples(triples)
pprint(triples_list)
frompyvis.networkimportNetwork
importnetworkxasnx
defcreate_graph_from_triplets(triplets):
G=nx.DiGraph()
fortripletintriplets:
subject,predicate,obj=triplet.strip().split(',')
G.add_edge(subject.strip(),obj.strip(),label=predicate.strip())
returnG
defnx_to_pyvis(networkx_graph):
pyvis_graph=Network(notebook=True,cdn_resources='remote')
fornodeinnetworkx_graph.nodes():
pyvis_graph.add_node(node)
foredgeinnetworkx_graph.edges(data=True):
pyvis_graph.add_edge(edge[0],edge[1],label=edge[2]["label"])
returnpyvis_graph
defgenerateGraph():
triplets=[t.strip()fortintriples_listift.strip()]
graph=create_graph_from_triplets(triplets)
pyvis_network=nx_to_pyvis(graph)
pyvis_network.toggle_hide_edges_on_drag(True)
pyvis_network.toggle_physics(False)
pyvis_network.set_edge_smooth('discrete')
html=pyvis_network.generate_html()
html=html.replace("'","\"")
returnf"""<iframestyle="width:100%;height:600px;margin:0auto"name="result"allow="midi;geolocation;microphone;camera;
display-capture;encrypted-media;"sandbox="allow-modalsallow-forms
allow-scriptsallow-same-originallow-popups
allow-top-navigation-by-user-activationallow-downloads"allowfullscreen=""
allowpaymentrequest=""frameborder="0"srcdoc='{html}'></iframe>"""
importgradioasgr
demo=gr.Interface(
generateGraph,
inputs=None,
outputs='html',
title="KnowledgeGraph",
allow_flagging='never',
live=True,
)
demo.launch(
height=800,
width="100%"
)如果你使用的是参考链接中的原代码,则很可能会遇到下面的问题。
(1)报错:module gradio has no attribute outputs. gradio版本4.16
解决:outputs=gr.outputs.HTML改为outputs='html'
本文我们主要是学习了如何利用AI将知识转化为知识图谱的结构。其中最主要的,就是从知识中提取出三元组,这就强依赖Prompt和大模型的能力了。然后,锦上添花的,代码实现了知识图谱结构的可视化。尽管简单,但思路绝对值得借鉴。
| 欢迎光临 链载Ai (https://www.lianzai.com/) | Powered by Discuz! X3.5 |