返回顶部
热门问答 更多热门问答
技术文章 更多技术文章

【LLM应用框架】DSPy构建RAG

[复制链接]
链载Ai 显示全部楼层 发表于 昨天 21:00 |阅读模式 打印 上一主题 下一主题

一、前言
在本文中,我们将基于DSPy构建一个简单的关于Linux应用的技术问答系统,并探索使用和不使用检索增强生成(RAG)的效果。
二、DSPy基本模块
(1)本地环境配置
  • 安装DSPy:需要python3.9及以上版本,这里我们从git工程地址下载所需的版本:
condacreate—namepy39python=3.9condaactivatepy39pipinstallgit+https://github.com/stanfordnlp/dspy.git@2.5.29
  • 本地部署LM模型:这里选择llama3.2,安装Ollama并运行LM服务:
curl-fsSL<https://ollama.ai/install.sh>|shollamarunllama3.2
  • 测试一下环境是否OK:importdspyllama32=dspy.LM('ollama_chat/llama3.2',api_base='http://localhost:11434',api_key='')dspy.configure(lm=llama32)
(2)DSPy基本模块

在DSPy中可以直接通过lm(prompt="prompt")lm(messages=[...])来提示语言模型。然而,DSPy 提供了模块作为定义语言模型函数的更好方式。

最简单的模块是dspy.Predict。它需要一个 DSPy 签名,即结构化的输入/输出模式,并为你指定的行为返回一个可调用函数。DSPy使用“内联”符号为签名声明一个模块,该模块将问题(类型为str)作为输入,并生成响应作为输出。

qa=dspy.Predict('question:str->response:str')response=qa(question="whatarehighmemoryandlowmemoryonlinux?")print(response.response)
在这个例子中,DSPy 为构建这个qa模块会将你的签名、语言模型和输入传递给一个适配器(Adapter),这是一个处理输入结构化和解析结构化输出以适应你的签名的层。可以很容易地查看 DSPy 发送的最近 n 个提示:
print(dspy.inspect_history(n=1))
得到的结果如下:
[2025-01-10T11:06:24.275829]
System message:
Your input fields are:1.`question`(str)
Your output fields are:1.`response`(str)
All interactions will be structured in the following way, with the appropriatevaluesfilled in.
[[## question ## ]]{question}
[[## response ## ]]{response}
[[## completed ## ]]
In adhering to this structure, your objective is: Given the fields`question`, produce the fields`response`.
User message:
[[## question ## ]]what are high memoryandlow memory on linux?
Respond with the corresponding output fields, starting with the field`[[ ## response ## ]]`,andthen ending with the markerfor`[[ ## completed ## ]]`.
Response:
[[## response ## ]]High MemoryandLow Memory on Linux refer to two different conditions that affectsystemperformance. Here's a brief explanation of each:省略[[ ## completed ## ]]
DSPy 有多种内置模块,例如 dspy.ChainOfThought、dspy.ProgramOfThought 和 dspy.ReAct。这些模块可以与基本的 dspy.Predict 互换使用。
三、 构建与优化RAG

使用DSPy 的基本功能其实已经可以快速地实现很多功能,而如果想构建一个高质量的系统并随着时间的推移不断改进,则需要通过评估系统的质量并利用 DSPy 的强大工具(如优化器)快速迭代。要衡量 DSPy 系统的质量,通常需要:

  • 输入样本:例如问答对的问题样本,需要加载一个包含问题及其标准答案的数据集。
  • 输出质量评分指标:指标种类繁多,有些指标需要理想输出的真实标签,例如用于分类或问答,其他指标是自监督的,例如检查忠实度或缺乏幻觉。对于问答任务,评估回答质量的优劣往往可以通过衡量:系统响应在多大程度上涵盖了标准答案中的所有关键事实,或者反过来:系统响应在多大程度上没有说出标准答案中没有的内容。这个指标本质上是“语义 F1”,因此可以从 DSPy 中加载一个 SemanticF1 指标,然后使用dspy.Evaluate计算平均得分。

下面的示例是在Colab上使用DSPy构建一个回答技术问题的 RAG 系统。输入样本从 RAG-QA Arena 数据集中获取了一些基于 StackExchange 的问题及其正确答案,并使用SemanticF1作为评估指标:

(1)Colab环境配置
!apt-getinstall-ypciutilslshw!curl-fsSLhttps://ollama.ai/install.sh|sh!pipinstalldspy!pipinstallfaiss-cpu
(2)启动ollama服务器
importosimportthreadingimportsubprocessimportrequestsimportjsonimporttimedefollama()s.environ['OLLAMA_HOST']='0.0.0.0:11434'os.environ['OLLAMA_ORIGINS']='*'subprocess.Popen(["ollama","serve"])time.sleep(10)ollama_thread=threading.Thread(target=ollama)ollama_thread.start()defllama_run():subprocess.Popen(["ollama","pull","llama3.2"])llama_run_thread=threading.Thread(target=llama_run)llama_run_thread.start()检查是否已经启动成功
检查是否已启动成功:
!curlhttp://localhost:11434/v1/chat/completions\-H"Content-Type:application/json"\-d'{"model":"llama3.2","messages":[{"role":"user","content":"Hello"}]}'
输出下面的回复说明已经在Colab中成功启动ollama服务:
{"id":"chatcmpl-676","object":"chat.completion","created":1745216425,"model":"llama3.2","system_fingerprint":"fp_ollama","choices":[{"index":0,"message":{"role":"assistant","content":"Hello!HowcanIassistyoutoday?"},"finish_reason":"stop"}],"usage":{"prompt_tokens":26,"completion_tokens":10,"total_tokens":36}}
(3)构建与迭代RAG系统
importdspyimportujsonfromdspy.utilsimportdownloadimportrandomfromdspy.evaluateimportSemanticF1fromsentence_transformersimportSentenceTransformer#使用本地部署的Llama3.2模型(通过Ollama服务),支持自定义API端点lm=dspy.LM('ollama_chat/llama3.2',api_base='http://localhost:11434',api_key='')dspy.configure(lm=lm)#从HuggingFace下载RAG-QAArena技术问答数据集,转换为DSPy的Example格式,#支持输入字段question的自动解析download("https://huggingface.co/dspy/cache/resolve/main/ragqa_arena_tech_examples.jsonl")withopen("ragqa_arena_tech_examples.jsonl")asf:data=[ujson.loads(line)forlineinf]data=[dspy.Example(**d).with_inputs('question')fordindata]#数据集划分,通过随机打乱后划分训练集(20条)、开发集(20条)和测试集(500条),用于后续优化与评估。random.Random(0).shuffle(data)trainset,devset,testset=data[:20],data[200:220],data[500:1000]print(f'{len(trainset)},{len(devset)},{len(testset)}')#初始化评估器(SemanticF1指标)metric=SemanticF1(decompositional=True)evaluate=dspy.Evaluate(devset=devset,metric=metric,num_threads=12,display_progress=True,display_table=2)#从HuggingFace下载RAG-QAArena技术问答精简数据集download("https://huggingface.co/dspy/cache/resolve/main/ragqa_arena_tech_corpus.jsonl")#加载技术文档语料库,截断超过6000字符的文档并添加省略号max_characters=6000#用于截断>99th百分位的文档withopen("ragqa_arena_tech_corpus.jsonl")asf:corpus=[ujson.loads(line)['text'][:max_characters].split('\n')[0]+'...'forlineinf]print(f"Loaded{len(corpus)}documents.Willencodethembelow.")#使用all-MiniLM-L6-v2句子嵌入模型生成文本向量embedding_model=SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')defembedder(texts):returnembedding_model.encode(texts)topk_docs_to_retrieve=5#每个搜索查询要检索的文档数量search=dspy.retrievers.Embeddings(embedder=embedder,corpus=corpus,k=topk_docs_to_retrieve)#继承dspy.Module,包含检索器(search)和生成器(ChainOfThought)classRAG(dspy.Module):def__init__(self):		#思维链(ChainOfThought):通过签名context,question->response声明输入输出关系self.respond=dspy.ChainOfThought('context,question->response')defforward(self,question):		#前向推理流程:检索文档->拼接上下文->生成最终响应context=search(question).passagesreturnself.respond(context=context,question=question)rag=RAG()rag(question="whatarehighmemoryandlowmemoryonlinux?")print(evaluate(RAG()))#使用MIPROv2优化器自动调整提示和权重tp=dspy.MIPROv2(metric=metric,auto="medium",num_threads=12)optimized_rag=tp.compile(rag,trainset=trainset,max_bootstrapped_demos=2,max_labeled_demos=2,requires_permission_to_run=False)#对比优化前后的性能baseline=rag(question="cmd+tabdoesnotworkonhiddenorminimizedwindows")print(baseline.response)pred=optimized_rag(question="cmd+tabdoesnotworkonhiddenorminimizedwindows")print(pred.response)print(evaluate(optimized_rag))

输出的部分结果如下所示:

  • 未经过调优的RAG评估得分
  • 优化器dspy.MIPROv2的工作步骤

STEP 1: 通过少量示例来引导模型进行任务

==>STEP1:BOOTSTRAPFEWSHOTEXAMPLES<==2025/04/2108:56:47INFOdspy.teleprompt.mipro_optimizer_v2:Thesewill be usedasfew-shot example candidatesforour program andforcreating instructions.
2025/04/2108:56:47INFOdspy.teleprompt.mipro_optimizer_v2:BootstrappingN=19sets of demonstrations...Bootstrappingset1/19Bootstrappingset2/19Bootstrappingset3/1975%|███████▌|3/4 [00:00<00:00, 9.09it/s]Bootstrapped 2 full traces after 3 examples for up to 1 rounds, amounting to 3 attempts....100%|██████████| 4/4[00:00<00:00, 9.12it/s]2025/04/2108:56:52INFOdspy.teleprompt.mipro_optimizer_v2:
STEP 2: 使用之前步骤中的少量示例、生成的数据集摘要、程序代码摘要以及随机选择的提示来提出指令。
>STEP2ROPOSEINSTRUCTIONCANDIDATES<==2025/04/2108:56:52INFOdspy.teleprompt.mipro_optimizer_v2:Wewill use the few-shot examples from the previous step, a generated dataset summary, a summary of the program code, and a randomly selected prompting tip to propose instructions.2025/04/2108:56:52INFOdspy.teleprompt.mipro_optimizer_v2:Proposinginstructions...2025/04/2108:56:52INFOdspy.teleprompt.mipro_optimizer_v2roposedInstructionsforPredictor0:2025/04/2108:56:52INFOdspy.teleprompt.mipro_optimizer_v2:0:Giventhe fields `context`, `question`, produce the fields `response`.2025/04/2108:56:52INFOdspy.teleprompt.mipro_optimizer_v2:1:Checkthe documentationforthe specific operating system version or use command-line tools like dscl and fs_usage to find the location....2025/04/2108:56:52INFOdspy.teleprompt.mipro_optimizer_v2:17:Giventhe context provided, respond with a step-by-step guide on how to recursively delete empty directoriesinyour home directory.
STEP 3: 通过贝叶斯优化算法找到最优的提示组合参数
==>STEP3:FINDINGOPTIMALPROMPTPARAMETERS<==2025/04/2108:56:52INFOdspy.teleprompt.mipro_optimizer_v2:Wewill evaluate the program over a series of trials with different combinations of instructions and few-shot examples to find the optimal combination usingBayesianOptimization.
2025/04/2108:56:52INFOdspy.teleprompt.mipro_optimizer_v2:==Trial1/25-FullEvaluationofDefaultProgram==Bootstrapped2full traces after3examplesforup to1rounds, amounting to4attempts.AverageMetric:9.65/16(60.3%):100%|██████████|16/16 [00:00<00:00, 65.16it/s]2025/04/21 08:56:52 INFO dspy.evaluate.evaluate: Average Metric: 9.654891774891775 /16(60.3%)2025/04/2108:56:52INFOdspy.teleprompt.mipro_optimizer_v2efaultprogram score:60.34
/usr/local/lib/python3.11/dist-packages/optuna/_experimental.py:31:ExperimentalWarning:Argument``multivariate``isan experimental feature.Theinterface can changeinthe future. warnings.warn(2025/04/2108:56:52INFOdspy.teleprompt.mipro_optimizer_v2:=====Trial2/25=====
AverageMetric:9.90/16(61.8%):100%|██████████|16/16 [00:00<00:00, 60.96it/s]2025/04/21 08:56:52 INFO dspy.evaluate.evaluate: Average Metric: 9.895873015873017 /16(61.8%)2025/04/2108:56:52INFOdspy.teleprompt.mipro_optimizer_v2:Bestfull score so far!Score:61.852025/04/2108:56:52INFOdspy.teleprompt.mipro_optimizer_v2: Score:61.85with parameters ['redictor0:Instruction12', 'redictor0:Few-ShotSet7'].2025/04/2108:56:52INFOdspy.teleprompt.mipro_optimizer_v2:Scoresso far: [60.34,61.85]2025/04/2108:56:52INFOdspy.teleprompt.mipro_optimizer_v2:Bestscore so far:61.852025/04/2108:56:52INFOdspy.teleprompt.mipro_optimizer_v2:========================
....
2025/04/2108:56:59INFOdspy.teleprompt.mipro_optimizer_v2:=====Trial26/25=====
AverageMetric:9.80/16(61.2%):100%|██████████|16/16 [00:00<00:00, 61.79it/s]2025/04/21 08:56:59 INFO dspy.evaluate.evaluate: Average Metric: 9.799439775910363 /16(61.2%)2025/04/2108:56:59INFOdspy.teleprompt.mipro_optimizer_v2: Score:61.25with parameters ['redictor0:Instruction16', 'redictor0:Few-ShotSet16'].2025/04/2108:56:59INFOdspy.teleprompt.mipro_optimizer_v2:Scoresso far: [60.34,61.85,58.79,55.82,50.58,53.71,63.6,67.79,62.04,65.41,67.79,64.42,67.79,57.3,58.93,62.62,63.39,63.53,60.06,56.46,58.87,63.79,67.79,54.2,67.79,61.25]2025/04/2108:56:59INFOdspy.teleprompt.mipro_optimizer_v2:Bestscore so far:67.792025/04/2108:56:59INFOdspy.teleprompt.mipro_optimizer_v2:=========================
2025/04/2108:56:59INFOdspy.teleprompt.mipro_optimizer_v2:Returningbest identified program with score67.79!
  • 调优后的RAG评估得分
四、总结
本文介绍了DSPy的简单用法以及在Colab中如何构建并优化迭代一个技术问答RAG系统的流程。


回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

链载AI是专业的生成式人工智能教程平台。提供Stable Diffusion、Midjourney AI绘画教程,Suno AI音乐生成指南,以及Runway、Pika等AI视频制作与动画生成实战案例。从提示词编写到参数调整,手把手助您从入门到精通。
  • 官方手机版

  • 微信公众号

  • 商务合作

  • Powered by Discuz! X3.5 | Copyright © 2025-2025. | 链载Ai
  • 桂ICP备2024021734号 | 营业执照 | |广西笔趣文化传媒有限公司|| QQ