|
ingFang SC";font-size: 14px;line-height: 1.75;margin: 1.5em 16px;letter-spacing: 0.1em;color: rgb(63, 63, 63);">在文档(尤其是法律文档)中,我们经常遇到的一些非常具体的问题之一是需要建立文档中不同条款的层级结构。这是因为某些条款有时会引用其他条款,以获取完整的含义和上下文。ingFang SC";font-size: 14px;line-height: 1.75;letter-spacing: 0.1em;color: rgb(63, 63, 63);margin-left: 16px;margin-right: 16px;">为了获得完整的上下文,您需要递归地导航并检索任何提到的条款(甚至是脚注!),通过文档层级图找到提到的条款,查看是否提到了其他条款,并重复这一过程。递归检索可以应用于法律文档以外的其他文档元素,例如页码、多模态数据(如图像)、指向其他文档或外部数据的超链接等。ingFang SC";font-size: 14px;line-height: 1.75;margin: 1.5em 16px;letter-spacing: 0.1em;color: rgb(63, 63, 63);">在涉及基于图的遍历和检索时,法律文档因其结构化的特性一直让我感到着迷。法律文档还要求一个非常具体的结构化检索步骤,即“定义页”,其中每个重要术语都有一个具体的变量或固定定义,这些定义可能会根据文档的性质而发生变化。ingFang SC";font-size: 14px;line-height: 1.75;margin: 1.5em 16px;letter-spacing: 0.1em;color: rgb(63, 63, 63);">在本 Notebook 和示例中,我们希望展示如何轻松创建法律文档中条款的文档层级图、法律文档中的定义图,以及一个多图多智能体的递归检索过程,该过程通过智能导航文档层级图和定义图来获取针对问题的完整上下文。ingFang SC";font-size: 14px;line-height: 1.75;margin: 1.5em 16px;letter-spacing: 0.1em;color: rgb(63, 63, 63);">在这个示例中,我们选取了一份由马来西亚中央银行发布的简单监管文件,特别是这份文件(https://www.bnm.gov.my/documents/20124/938039/pd_compliance_Oct2015+%28Reissuance_May2016%29.pdf/6da52ca8-734a-d2fb-e214-093430ea8fb6?t=1592219195301),它规定了各种组织的合规规则。ingFang SC";font-size: 14px;line-height: 1.75;margin: 1.5em 16px;letter-spacing: 0.1em;color: rgb(63, 63, 63);">我们想要回答的问题是:董事会和首席合规官(CCO)如何管理控制职能?ingFang SC";font-size: 14px;line-height: 1.75;margin: 1.5em 16px;letter-spacing: 0.1em;color: rgb(63, 63, 63);">回答这个问题的理想流程如下:ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;padding-left: 1em;list-style: circle;color: rgb(63, 63, 63);" class="list-paddingleft-1"> •从定义页中获取 CCO 的定义。 •检索第 6.3 条和第 7.2 条的信息。 •检测到第 7.2 条中提到“参见第 7.3 条和第 7.4 条”。 •遍历并检索第 7.3 条和第 7.4 条。 •检测到提到第 9.1 条并检索第 9.1 条。 ingFang SC";font-size: 14px;line-height: 1.75;margin: 1.5em 16px;letter-spacing: 0.1em;color: rgb(63, 63, 63);">作为快速基准测试,当我们使用嵌入了智能体推理能力的 GPT-4o 运行该问题时,我们得到了以下答案:ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;overflow-x: auto;border-radius: 8px;margin: 10px 8px;">ThemanagementofcontrolfunctionsbytheBoardandtheChiefCompliance Officer(CCO)involvesseveralkeyresponsibilitiesasoutlinedinthe compliancedocument:
ResponsibilitiesoftheBoard: 1.OversightofComplianceRiskManagement:TheBoardisresponsiblefor overseeingthemanagementofcomplianceriskwithinthefinancialinstitution.Thisincludesapprovingthecompliancepolicy,establishingthecompliancefunction,andappointingtheCCO. 2.ApprovalandSupportfortheComplianceFunction:TheBoardmust ensurethatthecompliancefunctionandtheCCOhavetheappropriatestanding, authority,andindependencetocarryouttheirdutieseffectively.TheBoard mustalsoensurethatthecompliancefunctionisadequatelyresourced. 3.EngagementwiththeCCO:TheBoardshouldengagewiththeCCOregularlyto discussissuesfacedbythecompliancefunctionandensuretheCCOhasdirect accesstotheBoard. 4.AnnualEvaluation:TheBoardmustevaluatetheeffectivenessofthe institution’smanagementofcomplianceriskatleastannually,considering assessmentsfromseniormanagement,internalaudit,andtheCCO.
ResponsibilitiesoftheChiefComplianceOfficer(CCO): 1.CoordinationandManagementofComplianceRisk:TheCCOisresponsiblefor coordinatingtheidentification,management,andmonitoringofcompliancerisk acrosstheinstitution.Thisincludesensuringcompliancemonitoringand testingareconsistentacrosstheorganization. 2.ReportingandAdvising:TheCCOmustreportregularlytoseniormanagement onthefindingsandanalysisofcompliancerisksandmustensurethatreports arereadilyavailabletointernalauditandregulatoryauthorities.TheCCO alsoadvisestheBoardandseniormanagementonlegalandregulatory requirements,keepingtheminformedofdevelopmentsandtheirimplications. 3.Independence:TheCCOmustmaintainindependencefrombusinesslinesto effectivelycarryouttheroleofacontrolfunction.Thisincludesensuring thatthereisnoconflictofinterestintheirresponsibilities,reporting lines,orremuneration. 4.ResourcesandTraining:TheCCOmustensurethatthecompliancefunction issufficientlyresourcedwithofficerswhohavethenecessaryqualifications andexperience.TheCCOisalsoresponsibleforensuringthatadequatetraining isprovidedtoofficersonrelevantlegalandregulatoryrequirements.
InteractionwithOtherControlFunctions: Thecompliancefunctionshouldbeorganizedinawaythatallowseffective managementofcompliancerisk,consideringthesize,nature,andcomplexityof thefinancialinstitution’soperations.Whereresponsibilitiesareshared betweenadedicatedcomplianceunitandothercontrolfunctions, cleardocumentationandcoordinationarerequiredtoensurethattheCCOcan performtheirdutieseffectivelywithoutimpairingindependenceorfocus. Byfulfillingtheseresponsibilities,theBoardandtheCCOcanensurethat theinstitution'scompliancerisksaremanagedeffectively,supportingthe overallsafetyandsoundnessofthefinancialinstitution.这个答案有趣但不完整,因为它: •似乎特别关注“合规”这一关键词,并强调了与其他条款相关的所有合规义务,而不是专门针对董事会和 CCO。例如,第 8.2 条中提到的跨司法管辖区的协调是在“合规”上下文中提出的,而第 7.8 条中提到的薪酬独立性也是在“合规”上下文中提出的。 •关键是,它似乎遗漏了第 7.3 条和第 7.4 条中明确提到的董事会和 CCO 之间关于 CCO 共享控制职能时需要董事会批准的关键义务。它还遗漏了第 9.1 条关于审计与合规职能分离的内容。
这可以理解,因为递归检索条款、页码和脚注并不是典型基于语义相似性检索(RAG)流程的明确部分。 多图多智能体工作流开发总结 为了让大家了解每个智能体的作用,我们在附录中包含了每个智能体的代码片段。 图的创建在本 Notebook 中,我们首先通过 Reducto 的文档摄取引擎解析文档结构。文档结构将每一页分解为不同的元素,例如章节标题、列表项或页脚。  这些元素根据它们出现的顺序及其隐含的层级结构组合在一起,例如章节标题是列表项的父节点。然后,我们分析文档中的链接,以识别提取元素之间的连接,这些连接可以在词汇图中建模。   然后我们使用我们的 SDK(https://whyhow-ai.github.io/whyhow-sdk-docs/examples/create_graph_from_triples/) 将块和三元组导入 WhyHow 的知识图工作室中,以创建词汇图。 我们还为文档构建了一个法律定义图。法律文档中独特的一点是,每个文档都有一个定义部分,用于定义特定术语的具体解释方式。这可能因文档、用例和客户而异。在本例中,文档在第 4-5 页包含定义。这些文本被提取后传递给 GPT-4o,并通过提示提取法律术语及其定义(逐字返回),然后将输出转换为 CSV 文件,并使用 SDK 和预定义架构作为单独的图上传。定义智能体在需要时调用定义图,以通过特定相关定义增强上下文。在本例中,定义智能体在初始条款被检索后被调用。  然后我们将 WhyHow 中的节点导入 Notebook,并使用 LlamaIndex 对节点信息进行本地索引,同时保留 WhyHow 生成的嵌入。我们结合使用了 LlamaIndex 的向量检索、BM25 和关键词检索。在法律文档用例中,查询和检索过程需要术语的精准性,BM25 有助于在高度重复的文本中识别关键术语,而关键词检索则确保了重要术语即使出现频率较低,也能被检索到。 我们使用 LangGraph 结合 WhyHow SDK 和 GPT-4o 构建了一个围绕词汇图的多智能体工作流。本质上,当传递查询时,系统首先通过初始搜索智能体搜索相关的向量块。在本例中,向量块是条款或子条款。随后调用定义智能体以通过定义部分增强条款的相关定义。路由智能体接着检测是否需要引用额外的链接部分或页脚,如果需要,则检索相应部分并加以考虑。如果后续检索到的条款(如本例中所示)提到更多条款,则递归检索智能体将递归地执行此操作。 最初检索到的条款是第 6.3 条和第 7.2 条。定义图被引用以检查定义部分是否有任何额外的上下文需要包括。关于“CCO”和“高级管理层”的附加定义被包括在内。 在第 6.3 条中,有以下子条款 6.3.f: 根据检索到的第一个条款的信息,路由智能体帮助检测材料中是否提到了条款或页脚。在本例中,第一个条款关联了一个页脚(脚注 3)。路由智能体接着触发页脚解析智能体。页脚解析智能体识别相关页脚并返回以下页脚: 此时需要进行另一次遍历,递归检索智能体被用来遍历词汇图并检索第 7.3 条和第 7.4 条中的块/条款。 来自第 7.3 条和第 7.4 条的新信息被组合在一起。第 7.4 条 (b) 提到了第 9.1 条: 第 7.4 条包含了第 9.1 条的链接,路由智能体检测到这一点,并指示递归检索智能体在下一次传递中检索它。回答智能体跟踪所有传入信息以总结并最终制定返回给用户的最终答案。 最终构建的答案反映了我们的理想流程,包含定义页、第 6.3 条和第 7.2 条、脚注 3、第 7.3 条、第 7.4 条和第 9.1 条的所有相关信息,并通过智能遍历准确总结返回给用户。 为了确保结果不是偶然的,我们将最终查询运行了 3 次,均成功检索到相关信息。 Tomanagecontrolfunctionseffectively,theBoardandtheChiefCompliance Officer(CCO)havedistinctresponsibilitiesthattheymustexercise:
ResponsibilitiesoftheBoard: -ApprovecriticaldecisionsregardingtheCCO,includingappointment, remuneration,andtermination(Section6.3(a)). -EnsuretheCCOhassufficientstaturetoengageeffectivelywithsenior management(Section6.3(b)). -RegularlyengagewiththeCCOtodiscusscomplianceissuesandconsider interactionswithoutseniormanagementpresent(Section6.3(c)). -ProvidetheCCOwithunimpededaccesstocommunicatewiththeboarddirectly (Section6.3(d)). -SupporttheCCOwithadequateresourcestoperformdutieseffectively, includingcompetentstaff(Section6.3(e)). -Satisfythemselvesthatcombinedresponsibilities,ifany,donotcompromise thecontrolenvironment(Section6.3(f)).
ResponsibilitiesoftheCCO: -Coordinatetheidentificationandmanagementofinstitution-widecompliance risks(Section7.2(b)). -Ensureconsistentconductofcompliancemonitoringandtestingacrossthe organization(Section7.2(b)). -Maintainindependenceandsufficientfocusoncomplianceduties,evenwhen taskedwithadditionalcontrolfunctions(Section7.3).
SharedResponsibilities&Coordination: -Theboardmustapproveanysharingofcompliancefunctionresponsibilities betweenthecomplianceunitandothercontrolfunctions(Section7.4(a)). -Functionresponsibilities,includingtimelycommunicationofissues,should bewell-definedanddocumented(Section7.2(a)). -Effectivearrangementsforcoordinationamongcontrolfunctionsshouldbein placetofacilitatetheCCO’sresponsibilities(Section7.2(d)). -Complianceresponsibilitiesmustnotcompromisetheseparationofthe internalauditfunction(Section9.1).
Theboardshouldensurecomprehensiveoversight,andtheCCOshouldmaintain effectivecoordinationandcommunicationacrosstheorganizationtomanage controlfunctionsefficiently.
通过这次练习,我们开发了一个系统,展示了以下内容: •一个多图系统,其中每个图代表 RAG 系统中不同的流程和目标。 •使用 Reducto、WhyHow 和 LlamaIndex 创建支持 RAG 的自动词汇图。 •一个多智能体系统,允许基于文档意图进行智能遍历,以结构化方式返回每个部分和子部分的信息。 •一个由 LangGraph 管理的多图多智能体系统。
WhyHow.AI 的知识图工作室平台(https://github.com/whyhow-ai/knowledge-graph-studio)是构建模块化智能知识图(结合 LLM、开发者和非技术领域专家工作流)的简单方法。 附录:智能体代码片段定义智能体defdefinitions_search(query_prompt:str,client:Optional[WhyHow]=None)->Dict[str,str]: """ Searchfordefinitionsoftermsinaquestionpromptandreturnthemasadictionary. """ ifclientisNone: client=WhyHow(api_key=WHYHOW_API_KEY,base_url=WHYHOW_API_URL)
definitions_response=client.graphs.query_unstructured( graph_id=definitions_graph.graph_id, query=query_prompt, )
response_text=definitions_response.answer term_def_pairs=response_text.split('\n') definitions_dict={}
forpairinterm_def_pairs: if':'inpair: term,definition=pair.split(':',1) definitions_dict[term.strip()]=definition.strip()
returndefinitions_dict
query_prompt="""Returnmedefinitionsforthetermsinthisquery:"HowcantheBoardandtheCCOmanagecontrolfunctions?"Ensuretheterm-definitionpairsareseparatedbynewlines,properlycapitalised"""
definitions_dict=definitions_search(query_prompt)
defprint_prompt_definitions_dict(definitions_dict): prompt="RelevantDefinitions:\n" forterm,definitionindefinitions_dict.items(): prompt+=f"{term}:{definition}\n" returnprompt
print(print_prompt_definitions_dict(definitions_dict))
路由智能体defrouter_agent(state:AgentState)->AgentState: #decideifprocessshouldshouldstoporcontinue
starter_prompt_footer=f""" Youareanintelligentagentoverseeingamulti-agentretrievalprocessofgraphnodesfromadocument.Thesenodesaretoanswerthequery: ```{state['query']}```
Belowthisrequestisalistofnodesthatwereautomaticallyretrieved.
Youmustdetermineifthelistofnodesisenoughtoanswerthequery.Ifthereisn'tenoughinformation,youmustidentifyanyrelevantfooterinformationinthenodes.
Anodecanfooterinformationaskingtolookinanothersection/partofthedocument,whichwillrequireaseparatenaturallanguagesearch. Example:Ifthefootersays"seeparagraphx",asearchquerye.g."Returnparagraphxtoanswerthequery'{state['query']}'"shouldbemade.
Iftherearenofurthernodesworthanalyzing,returnanemptyresponse.ONLYRETURNQUERIESFORFOOTERSTHATARERELEVANTTOANSWERINGTHEQUERY
Else,ifanyrelevantnodesrequireafootersearch,specifythenode_idandthesearchquery. Nodesareidentifiedbynode_idandmustbequotedinbackticks. """
starter_prompt_link=f""" Youareanintelligentagentoverseeingamulti-agentretrievalprocessofgraphnodesfromadocument.Thesenodesaretoanswerthequery: ```{state['query']}```
Belowthisrequestisalistofnodesthatwereautomaticallyretrieved.
Youmustdetermineifthelistofnodesisenoughtoanswerthequery.Ifthereisn'tenoughinformation,youmustidentifyanylinkednodesthatcouldbeworthexploring.
Iftherearenofurthernodesworthanalyzing,returnanemptyresponse.
Returnalistofnode_ids.ONLYRETURNNODE_IDSforNODESTHATARERELEVANTTOANSWERINGTHEQUERY.Nodesareidentifiedbynode_idandmustbequotedinbackticks. """
#collectlatestnodes,andallnodes last_fetched_nodes_flattened ict[str,MultiAgentSearchLocalNode]={} all_nodes_flattened ict[str,MultiAgentSearchLocalNode]={}
监督智能体defsupervisor_agent(state:AgentState)->AgentState:
#Lookforsearchfailures.Thismightbeaninstancewheremultiplesearchesweremadeforcertainpartsofthedocument,butnorelevantinformationwasfound. #Thismeansthatthesearchhastobeendedprematurelytopreventinfiniteloops. printout="" fornodeinstate["previous_nodes"]: printout+=node.print_node_prompt() fornodeinstate["last_fetched_context_nodes"]: printout+=node.print_node_prompt()
prompt=f""" Youareasupervisoragentoverseeingthemulti-agentretrievalprocessofgraphnodesfromadocument.Thenodesaretoanswerthequery: ```{state['query']}```
Belowisalistofnodesthatwereautomaticallyretrieved,followedbyalistoferrors.Iftherearemanysimilar,repeatederrorsintheretrievalprocess,wherenofurtherlinkedorrelevantnodescouldberetrieved,returnENDtoendtheprocess.ElsereturnCONTINUE. Returnonlyasingleword,eitherENDorCONTINUE. """
completion=openai_client.beta.chat.completions.parse( model="gpt-4o-2024-08-06", messages=[ {"role":"system","content":prompt}, {"role":"user","content":printout}, {"role":"user","content":state['search_failures']}, ], )
递归智能体defrecursive_retrieval(state:AgentState)->AgentState:
current_nodes=state["last_fetched_context_nodes"]
forcurrent_nodeincurrent_nodes: state["previous_nodes"].append(current_node)
new_current_nodes=[]
#lookupthenodestofetchbyid
fornode_idinstate["node_links_to_fetch"]: #sometimesGPTreturnsnodeidswithorwithoutbackticks ifnode_id[0]=="`": node_id=node_id[1:-1] ifnode_idinlocal_nodes_map: new_current_nodes.append(local_nodes_map[node_id]) else: state["search_failures"].append(f"Failedtofetchnodewithid:{node_id}")
fornode_id,search_queryinstate["node_footers_to_fetch"].items(): #fetchnodesbykeywordandbm25search footer_retrieved_nodes=retrieve_with_keywords_bm25(search_query) #LLMprunesnodesthatarenotrelevant footer_retrieved_nodes,_=prune_nodes(search_query,footer_retrieved_nodes)
fornodeinfooter_retrieved_nodes: new_current_nodes.append(node)
#ifnonodesfetched,logfailure iflen(footer_retrieved_nodes)==0: state["search_failures"].append( f"Failedtofetchnodesforquery:{search_query}" )
state["last_fetched_context_nodes"]=new_current_nodes state["pass_count"]+=1 state["node_footers_to_fetch"]={} state["node_links_to_fetch"]=[]
returnstate
回答智能体defanswering_agent(state:AgentState)->AgentState: #answerthequery prompt=f""" Youareanansweringagent.Youwillbegivenalistofdocumentnodesthatwereautomaticallyretrievedbythesystem.Thesenodesaretoanswerthequery: ```{state['query']}```
Givereferencestosections/paragraphsifpossible,butdonotoutputfullnodeidswithbackticksandthehash. """
通过多图多智能体系统检索的答案变体答案 #2: Tomanagecontrolfunctionseffectively,boththeBoardandtheChief ComplianceOfficer(CCO)mustcollaborateandestablishastructured framework.Keymeasuresinclude:
Board’sResponsibilities: -TheBoardmustapprovetheappointment,remuneration,andterminationofthe CCO,ensuringtheyhavethenecessarystaturetoengageeffectivelywith seniormanagement(S6.3(a),(b)). -TheBoardshouldengageregularlywiththeCCOtodiscusscomplianceissues andensuretheCCOhasdirectaccesstotheBoard(S6.3(c),(d)). -TheBoardmustalsoassurethattheCCOisgrantedadequateresourcesand support,includingcompetentofficers,tofulfilltheirduties(S6.3(e)).
CCO’sRoleandIndependence: -TheCCOshouldoverseetheinstitution-widecomplianceriskmanagementand ensurethatresponsibilitiesareclearlydocumentedandcommunicatedinthe compliancepolicy(S7.2(a),(b)). -IftheCCOisinvolvedinothercontrolfunctions,theymustensuretheir independenceandabilitytofocusoncomplianceresponsibilitiesremain uncompromised(S7.3).
CoordinationwithOtherFunctions: -PropercoordinationbetweentheCCOandothercontrolfunctionsisessential. Thisincludesrobustinformationflowsandmechanismsforseekingcompliance advice(S7.2(d)). -TheBoardmustapprovethesharingofcomplianceresponsibilitiestoensure thereisnodilutionoffocusoreffectiveness(S7.4(a)).
SeparationfromInternalAudit: -Compliancefunctionsshouldremaindistinctfromtheinternalauditto maintaintheefficacyofindependentreviews(S7.4(b),S9.1).
TrainingandCapacityBuilding: -Thecompliancefunctionisresponsiblefortrainingandprovidingguidance tootherofficersonlegalandregulatoryrequirementstomanagecompliance riskeffectively(S8.11).
Byimplementingthesesteps,theBoardandtheCCOcaneffectivelymanage controlfunctions,maintainingrobustcomplianceandgovernanceframeworks.
答案 #3: ThemanagementofcontrolfunctionsbytheBoardandtheChiefCompliance Officer(CCO)involvesseveralkeyresponsibilitiesandstructures:
RoleoftheBoardinManagingCCOFunctions: -Theboardmustapprovetheappointment,remuneration,andterminationofthe CCO(Section6.3(a)). -TheboardshouldensurethattheCCOhassufficientstaturetointeract effectivelywiththeCEOandseniormanagement(Section6.3(b)). -RegularengagementbetweentheboardandtheCCOisimportanttodiscuss complianceissuesdirectly(Section6.3(c)). -TheCCOmusthavedirect,unimpededaccesstotheboard(Section6.3(d)). -TheremustbeadequateresourcesandsupportfortheCCOtoperformhis dutieseffectively(Section6.3(e)).
SharedResponsibilitiesandIndependence: -Wherecompliancefunctionsareshared,theboardmustapprovethis arrangement,andresponsibilitiesshouldbeclearlydefinedanddocumented inthecompliancepolicy(Section7.2). -TheCCOshouldnotassumeresponsibilitiesforinternalaudit,asthiscan compromiseindependentreviewprocesses(Sections7.4,9.1). -TheCCOmustensurethattheirindependenceandabilitytofocuson compliancearenotimpairedbyadditionalresponsibilities(Section7.3).
ResponsibilitiesWithintheOrganization: -Complianceistheresponsibilityofallofficerswithintheinstitution. Businesslinesmanagecomplianceriskthroughtheirmanagerialcontrols, whilethecompliancefunctionensuresthatthesecontrolsareadequate (Section1.2). -Theinternalauditfunctionprovidesindependentassuranceonthequality andeffectivenessoftheinstitution’scontrols,includingthoseconcerning compliance(Section1.2(c)).
CoordinationAcrossControlFunctions: -ArrangementsforcoordinationamongcontrolfunctionsandtheCCOmust promoteaconsistentapproachtomanagingcompliancerisk,withadequate informationflowsandavenuesforadvice(Section7.2(d)).
Byfollowingtheseguidelines,theBoardandtheCCOcanmanagethe compliancecontrolfunctionseffectively,ensuringthatcompliancerisks areappropriatelyidentified,managed,andmitigatedacrosstheorganization.
|