大模型RAG神器，利用LangSmith监测、跟踪、微调LLM和RAG - 链载Ai

在日新月异的AI领域，大型语言模型（LLMs）已成为现代应用的中流砥柱，其在生成响应、增强客户互动和辅助内容创作等方面扮演着重要角色。要进一步提升LLM应用的效果，Langsmith为此提供了清晰的指导和强大的支持。

本文介绍Langsmith如何提升LLM应用，助力开发者在AI领域的发展。

1 Langsmith简介

Langsmith是个强大的监控和优化工具，专为LLM和RAG系统设计。它提供实时洞察，让开发者全面了解应用性能，从响应时间到准确率，帮助精准管理并提升LLM效率。本文将聚焦Langsmith在调试、监控和测试方面的应用。

以下是Langchain生态系统的高层次视图。本文聚焦Langsmith在调试、监控和测试方面的应用。

LangSmith生态系统

2 选择LangSmith的理由

DevOps和MLOps分别革新了Web开发和数据科学领域，现在，这些理念也被引入AI应用管理。随着对高效AI系统的需求日益增长，集成、跟踪和监控越来越重要。Langsmith正是在这样的背景下，为AI应用提供了强大支持，帮助提升AI应用的效率和可靠性。接下来，我们看看它的实际应用，实践出真知！

LangSmith主要特性

3 将Langsmith融入LLM工作流

这里构建一个基于RAG的问答系统，用以回答用户关于地缘政治的问题，内容涉及Wikipedia上的G7和G20国家信息。具体步骤如下：

利用源内容生成问答对；
在Langsmith中创建数据库并加载数据；
开发函数处理数据，从LLM获取输出；
利用Langsmith内置评估器测试LLM答案的准确性；
跟踪所有步骤并分析相关指标。与 LangSmith 集成的 Q&A RAG 系统的表现形式

步骤1：加载库

#使用pip安装langchain、langchain_openai和langchain_core库
fromlangchain_openaiimportChatOpenAI
fromlangchain_core.promptsimportChatPromptTemplate
fromlangchain_core.output_parsersimportStrOutputParser
importpandasaspd

步骤2：项目设置和访问API

要进行项目设置，你需要准备两个API密钥：一个用于集成Langsmith，另一个用于LLM模型。这里以OpenAI API为例，但你同样可以尝试其他模型，比如Gemini或Huggingface。

获取API密钥的步骤如下：

登录Langsmith/ OpenAI，并进入API设置页面。
在“API密钥”部分，选择创建新的API密钥，并设置相应权限。
生成密钥后，复制并妥善保管，以便后续在应用或集成中使用。

步骤3：设置API密钥

我们在使用Google Colab时，会将API密钥保存在secrets中。通过设置环境变量LANGCHAIN_TRACING_V2为True，我们开启了LangSmith的跟踪功能，并定义了一个项目名称以便在LangSmith上进行跟踪。

fromgoogle.colabimportuserdata
importos

os.environ["OPENAI_API_KEY"]=userdata.get('OPENAI_API_KEY')
os.environ["LANGSMITH_API_KEY"]=userdata.get('LANGSMITH_API_KEY')
os.environ["LANGCHAIN_TRACING_V2"]="true"
os.environ["LANGCHAIN_PROJECT"]="DEMO"

步骤4：数据处理

从文件内容中生成一个样本问答集，最终创建一个包含问答对的数据框。

context="""TheG7isaclubofWesternnations(withJapangiventhatstatusasanallyoftheWestandamajoreconomy)thathavedominatedtheworldanditsinstitutions,insomecasesforcenturies,andretaintheambitiontomaintainthatpositionbypolicycoordinationamongstthemselvesandbyco-optingrisingpowers,includingIndia,giventheshiftsinglobalpowerinrecentdecades.
TheG7recognisedthattheycouldnotmanagethe2008financialcrisisontheirownandneededawiderinternationalpartnership,butoneundertheiraegis.Withthisinmind,theG20forumhithertoatthefinanceministerlevelwasraisedtothesummitlevel.TheG20agendais,however,shiftingincreasinglytowardstheinterestsandprioritiesofthedevelopingcountries(nowbeingreferredtoastheGlobalSouth).DuringIndia’sG20presidency,withIndiaholdingtheVoiceoftheGlobalSouthsummitsbeforepresidingovertheG20andattheconclusionofitswork,andwiththeinclusionoftheAfricanUnionasaG20permanentmemberatIndia’sinitiative,thepro-GlobalSouthcontentoftheG20agendahasgotconsolidated.
BoththeG7andtheG20,however,facechallengesfromotherplatformsforconsensus-buildingonglobalissues.BRICS,agroupofnon-Westerncountries,isgettingexpandedtoresistthehegemonyoftheWestthatisstillexpressingitselfintheformofsanctions,theweaponisingoffinance,regimechangepoliciesanddoublestandardsinaddressingissuesofdemocracyandhumanrightsetc.AnexpandedBRICSwillrivalboththeG7andtheG20asaplatformforpromotingmultipolarity,agreaterroleofdevelopingcountriesinglobalgovernance,moreequityininternationalrelations,andintroducingmuch-neededreformsintheinternationalsystem."""


inputs=[

"WhatistheG7,andhowhasithistoricallypositioneditselfinglobalgovernance?",

"Howdidthe2008financialcrisisinfluencetheroleoftheG20,andhowhastheagendashifted?",

"HowhasIndiainfluencedtheG20agendaduringitspresidency?",

"WhatchallengesdotheG7andG20facefromotherglobalplatforms?",

"HowdoestheexpansionofBRICSposeathreattotheG7andG20intermsofglobalinfluence?"
]

outputs=[

"TheG7isagroupofWesternnations,includingJapan,whichhashistoricallydominatedglobalinstitutionsandpolicymaking.TheG7'spositionstemsfromtheeconomicandpoliticalpowerofitsmembers,andtheyhavecoordinatedpoliciestomaintaintheirinfluence.Inresponsetoshiftsinglobalpower,theG7hasalsosoughttoco-optrisingpowers,suchasIndia,initsstrategicplanning.",

"TheG7recognizedthatitcouldnothandlethe2008financialcrisisaloneandneededbroaderinternationalcooperation.Asaresult,theG20,whichhadpreviouslyoperatedatthefinanceministerlevel,waselevatedtothesummitleveltoensuregreaterglobalparticipationunderG7guidance.Overtime,theG20'sagendahasshiftedmoretowardstheinterestsofdevelopingcountries,especiallyunderIndia’sleadership,wherepro-GlobalSouthprioritieshavebecomemoreprominent,includingtheinclusionoftheAfricanUnionasapermanentmember.",

"DuringIndia’sG20presidency,thecountryactivelypromotedtheinterestsofdevelopingcountriesbyholdingthe'VoiceoftheGlobalSouth'summits.IndiaalsopushedfortheinclusionoftheAfricanUnionasapermanentmemberoftheG20,consolidatingtheagendatowardsaddressingtheconcernsoftheGlobalSouth,suchasgreaterequityandrepresentationinglobalgovernance.",

"BoththeG7andG20facechallengesfromothergroupslikeBRICS,whichconsistsofnon-WesterncountriesseekingtoresistWesterndominance.BRICShasexpandedasacounterbalancetotheG7'sinfluence,particularlycriticizingWesternsanctions,financialcontrols,andregimechangepolicies.AnexpandedBRICSaimstopromotemultipolarity,increasetheroleofdevelopingcountriesinglobalgovernance,andpushforreformsintheinternationalsystem.",

"TheexpansionofBRICSisadirectchallengetotheG7andG20asitaimstoofferaplatformthatpromotesmultipolarityandreducesWesternhegemony.Byadvocatingforgreaterequityininternationalrelationsandpushingforreformsinglobalgovernancestructures,anexpandedBRICSseekstorivaltheG7andG20,providinganalternativeconsensus-buildingmechanismfordevelopingnationsandnon-Westernpowers."
]

#数据集
qa_pairs=[{"question":q,"answer":a}forq,ainzip(inputs,outputs)]
df=pd.DataFrame(qa_pairs)
df

数据处理

步骤5：在LangSmith上创建数据集

根据上一节的数据，在LangSmith上创建一个新的数据集，并为其命名及添加描述。

fromlangsmithimportClient

client=Client()
dataset_name="Geo-politics"

#存储
dataset=client.create_dataset(
dataset_name=dataset_name,
description="QApairsaboutGeo-politicsmodel.",
)
client.create_examples(
inputs=[{"question":q}forqininputs],
outputs=[{"answer":a}forainoutputs],
dataset_id=dataset.id,
)

执行上述代码后，系统会提供一个链接，用于访问LangSmith上的数据集和进行测试。或者，可以直接访问LangSmith官网(https://smith.langchain.com)，登录后点击“数据集和测试”选项卡继续操作。

数据集和 LangSmith 测试

步骤6：使用LLM模型生成输出

有了API密钥、数据集和其他配置，现在可以创建函数，该函数处理输入问题并使用LLM模型生成响应——特别是，本例中使用的是OpenAI。这个函数会返回一个包含回答的字典。

importopenai
fromlangsmith.wrappersimportwrap_openai

openai_client=wrap_openai(openai.Client())


defget_response_from_llm(inputs:dict)->dict:

"""
Generatesanswerstouserquestionsbasedonaprovidedwebsite
textusingOpenAIAPI.

Parameters:
inputs(dict):Adictionarywithasinglekey'question',
representingtheuser'squestionasastring.

Returns:
dict:Adictionarywithasinglekey'output',containingthe
generatedsummaryasastring.
"""


#系统提示
system_msg=(
f"Answeruserquestionsin2-3sentencesaboutthis
context:\n\n\n{context}"
)


#传入网页文本
messages=[
{"role":"system","content":system_msg},
{"role":"user","content":inputs["question"]},
]


#调用OpenAI
response=openai_client.chat.completions.create(
messages=messages,model="gpt-3.5-turbo"
)


#输出字典中的响应

return{"answer":response.dict()["choices"][0]["message"]["content"]}

至于如何验证LLM输出的准确性，确保回答没有偏离事实，我们在后续步骤中使用LangSmith的内置评估工具来进行检验。

步骤7：使用LLM模型评估RAG

要评估LLM模型的表现，我们需要将LLM输出与真实情况进行比较。有多种方法可以做到这一点，比如可以用余弦相似度来衡量两者的匹配程度，分数越高，说明越接近。不过，这回我们用LangSmith内置的评估器cot_qa，它是专门为问答系统设计的，正好派上用场。

fromlangsmith.evaluationimportevaluate,LangChainStringEvaluator

#评估器
qa_evalulator=[LangChainStringEvaluator("cot_qa")]
dataset_name="Geo-politics"

experiment_results=evaluate(
get_response_from_llm,
data=dataset_name,
evaluators=qa_evalulator,
experiment_prefix="LLMOuput",

#Anyexperimentmetadatacanbespecifiedhere
metadata={

"variant":"stuffwebsitecontextintogpt-3.5-turbo",
},
)

步骤8：LangSmith跟踪与监控

执行代码后，LangSmith会跟踪输出结果。

在 LangSmith 上进行跟踪、监测和评估

结果概览：

我们上传了四组问答对到LangSmith。LLM生成的答案记录在第三列，每项旁边显示“成功”表示与标准答案匹配。

使用LangChainStringEvaluator进行准确评估。

深入了解Langsmith的具体产出

深入分析：

点击第一项输出，可以查看详细结果。左侧显示模型名称GPT-3.5-turbo，右侧则展示了时间戳、延迟等额外指标。

4 结语

随着AI技术的迅猛发展，工具如Langsmith显得越来越重要，这让开发者和数据科学家协同提升AI系统的效率和稳定性，确保应用能够满足市场新需求，提供优质用户体验。依靠Langsmith，开发者可以自信地迎接AI开发的挑战，释放AI应用的潜能，共同见证系统的蓬勃发展。