随着大语言模型（LLM）的能力不断提升，人工智能领域取得了一个巨大的进步

显示全部楼层

随着大语言模型（LLM）的能力不断提升，人工智能领域取得了一个巨大的进步。这些强大的语言模型拥有惊人的语言理解和生成能力。它们能够自动学习和分析大规模的文本数据，并生成准确、流畅的文本回复。这也为AGI（通用人工智能）带来了曙光。

1 概述

大语言模型在各种自然语言处理（Natural Language Processing，NLP）任务上取得惊人的效果。但其并不是万能的，当前还存在不少局限性和挑战，如：

需要庞大的计算资源和时间来进行预训练和微调，对于普通用户和开发者来说这是难以承受的。
缺乏最新的知识和特定领域的知识，导致大模型可能无法解答某些特定领域或者场景下的问题。
幻觉问题，即一本正经的胡说八道，看似流畅自然的表述，实则不符合事实或者是错误的。

Prompt是克服这些大语言模型挑战的一种有效手段，它就像是一把引导大语言模型的魔杖，帮助大语言模型更好地理解输入的意图和任务，以正确得生成特定类型、主题或格式的输出。Prompt的好坏直接影响到大语言模型输出的输出效果和用户体验，因此Prompt对于构建基于大语言模型的应用/系统也是至关重要。

那么，什么是Prompt呢？Prompt（提示）是指输入给大语言模型的文本片段（可以是一句简单的问题，一段较长的文本，或者一组指令，这取决于用户的具体需求），用于指导模型生成符合特定要求的文本。大语言模型的工作原理是根据输入的文本，来预测下一个词出现的概率，逐字生成出下文。它并不会像人类那样完全理解输入的Prompt，而是根据统计规律和语言模型来生成输出，例如，如果Prompt是“今天天气很”，模型可能会生成“晴朗”，“阴沉”等与天气相关的词语作为下一句话的开头。因此，输入的Prompt会直接影响输出结果的质量。即使是很小的语言差异，生成的内容也可能完全不同。

2 什么是DSPy

构建基于大语言模型应用时，其流水线通常使用Prompt实现，也是由于Prompt的原因，其构建过程往往存在着脆弱性。例如，在构建基于LLM的应用时，往往需要将问题分解为多个步骤，同时需要对每个步骤的Prompt进行微调，调整各个步骤以协同工作，在生成合成示例对每个步骤进行调整，并对大语言模型进行微调，从而生成合适的输出。但是，这个过程比较繁琐且易受到流程、大语言模型或者数据变化的影响。

面对上述变化时，需要反复试错并修改Prompt，传统的人工手写和调试Prompt的方法显然无法满足大模型应用开发的诉求。由LangChain等相继推出Prompt template能力，虽然可以高效支撑构建大模型应用，但是其对流程线中组件的变更比较敏感，且无法扩展。因此，如何解决构建大模型应用由Prompt带来的脆弱性问题呢？DSPy应运而生。

DSPy（Declarative Self-improving Language Programs（in Python）），即声明式自改进语言程序，其是一个对语言模型Prompt和权重进行算法优化的框架，由斯坦福大学NLP团队开发。其强调编程而非Prompt，并将构建基于语言模型的流水线从操作prompt转移到更贴近编程。

DSPy目标是解决构建基于LM（语言模型）应用的脆弱性问题。每当你改变一个组件时，它允许你重新编译整个流水线，以根据你的特定任务进行优化，从而免去了开发人员持续手动调整提示的麻烦。

此外，DSPy 还将程序的信息流与每一步的参数（提示和语言模型权重）分离开来，为构建基于语言模型的应用程序提供了更系统的方法。然后，DSPy 将根据您的程序，自动优化如何针对您的特定任务提示（或微调）语言模型。

DSPy工作流，来源于领英DSPy: The Future of Programming Language Models

DSPy主要包含签名（Signatures）、模块（Modules）和优化器（Optimizers，原叫Teleprompters）三个组件。其创新之处在于将签名、模块和优化器结合起来使用。签名为语言模型提供指导，而优化器则使用签名和一个度量或评估系统（可能是语言模型作为评判标准）来进行实验，以确定更理想的提示文本和最佳的少量示例集。因此，使用DSPy，开发者只需关注任务本身，而不必纠结于Prompt工程的具体细节，可以极大地提高了开发效率。

使用DSPy构建基于LLM的应用的工作流程如下所示：

1. 收集数据集：收集程序的输入和输出示例（例如问题及其答案，或主题及其摘要），这些示例将用于优化流水线。

2. 编写DSPy程序：用签名和模块以及组件之间的信息流定义程序的逻辑，以解决任务。

3. 定义验证逻辑：使用验证度量和优化器定义优化程序的逻辑，并根据输出结果和指标得分对流水线进行评估。

4. 编译DSPy程序：DSPy编译器考虑训练数据、编写程序、优化器和验证度量，以优化程序（如提示或微调）。

5. 迭代：通过改进数据、编写程序或验证来重复该过程，直到对流水线的性能感到满意为止。

3关键技术

本节将会对DSPy三个基本组件签名（Signatures）、模块（Modules）和优化器（Optimizers），以及关键技术度量（Metric）、断言（Assertions）进行介绍。

3.1 签名（Signatures）

签名是 DSPy 模块输入/输出行为的声明性规范。其目的是提供对任务或子任务以及输入和输出类型的最基本描述。对于简单的情况，签名可以是简短的字符串，也可以包含多个输入/输出字段。其参数名定义了输入/输出的语义角色。

单输入/输出字段

1.回答问题："question -> answer"

2.情感分类："sentence -> sentiment"

3.总结："document -> summary"

多输入/输出字段

1. 检索增强型问题解答："context, question -> answer"

2.带推理功能的多选题回答："question, choices -> reasoning, selection"

示例：情感分类

sentence = "it's a charming and often affecting journey."# example from the SST-2 dataset.
classify = dspy.Predict('sentence -> sentiment')classify(sentence=sentence).sentiment

Output:输出：

'ositive'

对于更高级的任务，签名可以定义为一个类。这样做的目的是提供有关任务本身的额外提示（见下文签名中的注释示例），或提供有关输入或输出字段的额外提示（作为描述关键字参数提供），如下图所示。这些变量名和注释有助于 LLM/LM 驱动的系统更好地理解任务。

classGenerateSearchQuery(dspy.Signature):"""Writeasimplesearchquerythatwillhelpansweracomplexquestion."""context=dspy.InputField(desc="maycontainrelevantfacts")question=dspy.InputField()query=dspy.OutputField()self.generate_answer=dspy.ChainOfThought(GenerateSearchQuery)

这些签名通常是连锁在一起的。例如，一个签名可能指定从检索模型中查询数据的意图，第二个签名可能指定使用这些检索到的数据/上下文和问题为用户生成答案的意图。

3.2 模块（Modules）

DSPy模块是利用语言模型（LM）构建程序的基本组件。

每个内置模块都抽象了一种提示技术（如Chain of Thought或ReAct）。每个模块都关联一个自然语言签名，并内部实现了相应的Prompt流程。
具有可调整的参数（即构成提示和语言模型权重的小部件），可影响提示内容和语言模型的行为，从而实现与输入的动态交互以产生输出。
DSPy模块还可以组成任意的流水线，从而组合成更复杂的程序。

DSPy 提供七个内置模块以满足各种用途，包括dspy.ReAct、dspy.ChainofThought、dspy.ChainOfThoughtWithHint 、dspy.Predict、dspy.ProgramOfThought、dspy.MultiChainComparison和dspy.Retrieve。

DSPy模块（dspy.Predict除外）在模块内利用并扩展签名提供的信息。例如，dspy.ChainOfThought模块添加了一个rationale字段，其中包括语言模型在生成输出之前的推理。

3.3 优化器（Optimizers）

优化器是一种算法，可以调整DSPy程序的参数（即提示和/或语言模型权重），以最大限度地提高您指定的指标（如准确性）。典型的 DSPy 优化器需要三个输入：

DSPy程序：可能是一个单一模块（如dspy.Predict），也可能是一个复杂的多模块程序。
度量：一个函数，用于评估程序的输出，并给程序打分（分数越高越好）。
训练输入：少量的示例，示例可以是不完整的（只有程序的输入，没有任何标签）。

当前DSPy实现了如下优化器：

自动少样本学习：LabeledFewShot、BootstrapFewShot、BootstrapFewShotWithRandomSearch、BootstrapFewShotWithOptuna、KNNFewShot。
自动指令优化：COPRO和MIPRO。
自动微调：BootstrapFinetune
程序转换：Ensemble

3.4 度量（Metric）

度量是一个函数，它将从你的数据中提取示例，并获取你的系统输出，然后返回一个量化输出好坏的分数，分数越高越好。度量函数对DSPy用户体验的影响很大，不仅决定了最终的质量评估，还会影响优化结果。度量函数涉及三个参数：数据集的示例 example、程序的输出（pred）和trace（可选参数）。这个函数的本质是返回一个float、int或bool分数。

下面的度量值如果为trace is None（即用于评估或优化），则返回float，否则返回bool（即用于引导演示）。

defvalidate_context_and_answer(example, pred, trace=None):# check the gold label and the predicted answer are the sameanswer_match = example.answer.lower() == pred.answer.lower()
# check the predicted answer comes from one of the retrieved contextscontext_match = any((pred.answer.lower() in c) for c in pred.context)
if trace isNone: # if we're doing evaluation or optimizationreturn (answer_match + context_match) / 2.0else: # if we're doing bootstrapping, i.e. self-generating good demonstrations of each stepreturn answer_match and context_match

3.5 断言（Assertions）

在DSPy中，断言被定义为程序元素，它定义了在语言模型流水线执行过程中必须遵守的某些条件或规则。这些约束可确保流水线的行为符合开发人员指定的不变量或准则，从而提高流水线输出的可靠性、可预测性和正确性。

LM（语言模型）断言分为两个明确定义的编程结构，即断言（Assertions）和建议（Suggestions），用构造体Assert和Suggest表示。它们是强制约束和引导LM流水线执行流程的结构。

相比传统的断言（一个检查条件，如果条件为假，则引发异常），DSPy提供了一种复杂的重试机制，同时支持多种新优化。当Assert失败时，流水线会转换到特殊的重试状态，使其能够重新尝试失败的LM调用，同时了解之前的尝试和引发的错误信息。在达到最大次数的自我改进尝试后，断言仍然失败，流水线就会过渡到错误状态，并引发AssertionError，从而终止流水线。

与Assert语句相比， Suggest语句是较软的约束，推荐但不强制执行条件，旨在引导LM 流水线朝向所期望的特定领域的结果。当Suggest条件不满足时，类似于Assert，流水线会进入特殊的重试状态，允许重新尝试失败的LM调用和自我改进。然而，如果建议在达到最大次数的自我改进尝试后仍然失败，流水线只会记录一个警告SuggestionError消息并继续执行。这使得流水线能够根据建议调整其行为，同时在面对次优状态（或次优或启发式计算检查）时保持灵活和弹性。

包含断言的SimplifiedBaleen程序示例：

class SimplifiedBaleenAssertions(dspy.Module):def__init__(self, passages_per_hop=2, max_hops=2):super().__init__()self.generate_query = [dspy.ChainOfThought(GenerateSearchQuery) for _ in range(max_hops)]self.retrieve = dspy.Retrieve(k=passages_per_hop)self.generate_answer = dspy.ChainOfThought(GenerateAnswer)self.max_hops = max_hops
defforward(self, question):context = []prev_queries = [question]
for hop in range(self.max_hops):query = self.generate_query[hop](context=context, question=question).query
dspy.Suggest(len(query) <= 100,"Query should be short and less than 100 characters",)
dspy.Suggest(validate_query_distinction_local(prev_queries, query),"Query should be distinct from: "+ "; ".join(f"{i+1}) {q}"for i, q in enumerate(prev_queries)),)
prev_queries.append(query)passages = self.retrieve(query).passagescontext = deduplicate(context + passages)
if all_queries_distinct(prev_queries):self.passed_suggestions += 1
pred = self.generate_answer(context=context, question=question)pred = dspy.Prediction(context=context, answer=pred.answer)return pred

4 实践

4.1 安装DSPy

使用pip install安装dspy-ai Python软件包。

pipinstalldspy-ai

安装main的最新版本：

pipinstallgit+https://github.com/stanfordnlp/dspy.git

4.2 CoT思维模式最小集

首先介绍利用DSPy包，跑通最简单的一次CoT思维模式问答，该示例涉及到 dspy.Signature 和 dspy.ChainOfThought 类，其中 dsyp.Signature是定义模块输入输出的类，dspy.ChainOfThought 为DSPy内置的思维模式类，以CoT模式与大模型交互，具体代码如下：

importdspy#定义并设置大模型model_name='llama3'lm=dspy.OllamaLocal(model=model_name)dspy.settings.configure(lm=lm)#定义输入输出参数类定义方式classQA(dspy.Signature):question=dspy.InputField()answer=dspy.OutputField()question="whatisthecolorofthesea?"summarize=dspy.ChainOfThought(QA)response=summarize(question=question)print(f"问题：{question}\n答案：{response.answer}")

上述代码首先定义了大模型使用 llama3 ，然后，定义了 dspy.Signature 类，输入字段为 question，输出字段为 answer，最后实例化 dspy.ChainOfThought 类，并输入问题调用大模型进行回答，执行结果为：

##类定义方式定义输入输出参数-start##问题：whatisthecolorofthesea?答案：Thecoloroftheseaistypicallyperceivedasblue.##类定义方式定义输入输出参数-end##

此外，depy.Signature 类还支持以inline的方式定义，通过简单的字符串描述输入输出字段，如下所示，在第2行直接将原来的 QA 替换为了字符串 "question->answer"，这种写法可以自动的被转换成 dspy.Signature 类：

question="whatisthecolorofthesky?"summarize=dspy.ChainOfThought('question->answer')response=summarize(question=question)

结果为：

##inline方式定义输入输出参数-start##问题：whatisthecolorofthesky?答案：Blue##inline方式定义输入输出参数-end##

如果希望查看 CoT模式的详细提示词，可以运行如下代码：

lm.inspect_history(n=1)

结果为：

Question:whatisthecolorofthesky?Reasoninget'sthinkstepbystepinordertoQuestion:whatisthecolorofthesky?Reasoninget'sthinkstepbystepinordertodeterminethecolorofthesky.Theskyappearsblueduetothescatteringoflightwavesintheatmosphere.Thebluelightisscatteredmoreefficientlythanothercolorsoflight,whichiswhytheskyappearsblue.Answer:Blue

4.3 为提示词增加示例

众所周知，通过调整提示词的内容，可以改变大模型返回的结果，比较常见的一种调整提示词的办法是为提示词增加示例，那么，如何给 CoT 模式增加示例呢？此处将引入 dspy.Example 类，是DSPy的数据类，用于构建示例，代码如下：

importdspymodel_name='llama3'lm=dspy.OllamaLocal(model=model_name)dspy.settings.configure(lm=lm)question="whatisthecolorofskyatnight?"#示例内容example=dspy.Example(question="whatisthecolorofsky?",answer="thecolorofskyisblue,evenatnight")summarize=dspy.ChainOfThought('question->answer')response=summarize(question=question,demos=[example])print(f"问题：{question}\n答案：{response.answer}")

可以看出，上述代码构建了 example实例，并将该实例加入到 dspy.ChainOfThought 的运行函数中，结果如下：

问题：whatisthecolorofskyatnight?答案：...thecoloroftheskyatnightisstillblue!

进一步查看提示词内容：

---Question:whatisthecolorofsky?Answer:thecolorofskyisblue,evenatnight---Question:whatisthecolorofskyatnight?Reasoninget'sthinkstepbystepinordertoQuestion:whatisthecolorofskyatnight?Reasoninget'sthinkstepbystepinordertoanswerthisquestion.Weknowthatduringtheday,thecoloroftheskyisblue,andwealsoknowthatthecoloroftheskyremainsrelativelyconsistentevenaftersunset.Therefore...Answer:...thecoloroftheskyatnightisstillblue!

可以看出，提示词中增加了示例的内容，这影响了问题最终的答案。

4.4 构建示例数据集

随着我们示例的增加，我们希望通过构建一个数据集来保存示例，并将整个数据集传入到CoT的推理过程中，这里，我们引入了DSPy内置的数据集GSM8K，这是一个多步骤数学推理数据集，示例如下：

importdspyfromdspy.datasets.gsm8kimportGSM8K,gsm8k_metricgsm8k_trainset=gsm8k.train[:20]model_name='llama3'lm=dspy.OllamaLocal(model=model_name,timeout_s=1000)dspy.settings.configure(lm=lm)question=("Rookiepoliceofficershavetobuydutyshoesatthefullpriceof$85,butofficerswhohaveservedatleastayeargeta20%discount.""Officerswhohaveservedatleastthreeyearsgetanadditional25%offthediscountedprice.Howmuchdoesanofficerwhohasserved""atleastthreeyearshavetopayforshoes?")#示例内容summarize=dspy.ChainOfThought('question->answer')response=summarize(question=question,demos=gsm8k_trainset)print(f"问题：{question}\n答案：{response.answer}")

此处将数据集传入到demos变量中，并询问了一个和数学计算有关的问题，结果如下：

问题：Rookiepoliceofficershavetobuydutyshoesatthefullpriceof$85,butofficerswhohaveservedatleastayeargeta20%discount.Officerswhohaveservedatleastthreeyearsgetanadditional25%offthediscountedprice.Howmuchdoesanofficerwhohasservedatleastthreeyearshavetopayforshoes?答案：296

实际上这个答案是错误的，正确的答案是51，因此提示词可以优化大模型的输出，但是也很难保证回答问题的准确性。提示词如下：

---Question:Theresultfromthe40-itemStatisticsexamMarionandEllatookalreadycameout.Ellagot4incorrectanswerswhileMariongot6morethanhalfthescoreofElla.WhatisMarion'sscore?Answer:24---Question:Stephenmade10roundtripsupanddowna40,000foottallmountain.Ifhereached3/4ofthemountain'sheightoneachofhistrips,calculatethetotaldistancehecovered.Answer:600000---Question:Bridgetcounted14shootingstarsinthenightsky.ReginaldcountedtwofewershootingstarsthandidBridget,butSamcountedfourmoreshootingstarsthandidReginald.HowmanymoreshootingstarsdidSamcountinthenightskythanwastheaveragenumberofshootingstarsobservedforthethreeofthem?Answer:2---Question:Sarahbuys20pencilsonMonday.Thenshebuys18morepencilsonTuesday.OnWednesdayshebuystriplethenumberofpencilsshedidonTuesday.Howmanypencilsdoesshehave?Answer:92---Question:Rookiepoliceofficershavetobuydutyshoesatthefullpriceof$85,butofficerswhohaveservedatleastayeargeta20%discount.Officerswhohaveservedatleastthreeyearsgetanadditional25%offthediscountedprice.Howmuchdoesanofficerwhohasservedatleastthreeyearshavetopayforshoes?Answer:51---Question:Theaveragescoreonlastweek'sSpanishtestwas90.Marcoscored10%lessthantheaveragetestscoreandMargaretreceived5morepointsthanMarco.WhatscoredidMargaretreceiveonhertest?Answer:86---Question:Athirdofthecontestantsatasingingcompetitionarefemale,andtherestaremale.Ifthereare18contestantsintotal,howmanyofthemaremale?Answer:12---Question:Nancyboughtapiesliceditinto8pieces.Shegave1/2toJoeandDarcy,andshegave1/4toCarl.Howmanysliceswereleft?Answer:2---Question:Meganpays$16forashirtthatcosts$22beforesales.Whatistheamountofthediscount?Answer:6---Question:Amayascored20marksfewerinMathsthanshescoredinArts.Shealsogot10marksmoreinSocialStudiesthanshegotinMusic.Ifshescored70inMusicandscored1/10lessinMaths,what'sthetotalnumberofmarksshescoredinallthesubjects?Answer:296---Question:BettyandDorastartedmakingsomecupcakesatthesametime.Bettymakes10cupcakeseveryhourandDoramakes8everyhour.IfBettytookatwo-hourbreak,whatisthedifferencebetweenthenumberofcupcakestheymadeafter5hours?Answer:10---Question:AliceandBobareeachgiven$2000toinvest.Aliceputsallofhermoneyinthestockmarketanddoubleshermoney.Bobinvestsinrealestateandmakesfivetimesmoremoneythanheinvested.HowmuchmoremoneydoesBobhavenowthanAlice?Answer:8000---Question:Atankcontains6000litersofwater,2000litersevaporated,andthen3500litersweredrainedbyBob.Howmanylitersareinthetankifitnowrainsfor30minutesandevery10minutes350litersofrainareaddedtothetank?Answer:1550---Question:Johntakes3daysoffofstreamingperweek.Onthedayshedoesstream,hestreamsfor4hoursatatimeandmakes$10anhour.Howmuchdoeshemakeaweek?Answer:160---Question:Billyisbreedingmiceforanexperiment.Hestartswith8mice,whoeachhave6pups.Whenthepupsgrowup,allthemicehaveanother6pups.Theneachadultmouseeats2oftheirpupsduetothestressofovercrowding.Howmanymiceareleft?Answer:280---Question:Johnneedstoreplacehisshoessohedecidestobuya$150pairofNikesanda$120pairofworkboots.Taxis10%.Howmuchdidhepayforeverything?Answer:297---Questionarylisloadingcratesatawarehouseandwantstomakesurethattheyarenotoverloaded.Eachcratecanweighupto20kgandhehas15crateshecanfill.Hehas4bagsofnailstoload,eachofwhichweighs5kg;hehas12bagsofhammers,eachofwhichweighs5kg;healsohas10bagsofwoodenplanks,eachofwhichweighs30kgandcanbesub-divided.Herealizesthathehastoomuchtoloadandwillhavetoleavesomeitemsoutofthecratestomeettheweightlimit.Inkg,howmuchisDarylgoingtohavetoleaveoutofthecrates?Answer:80---Question:Tom'srabbitcanrunat25milesperhour.Hiscatcanrun20milesperhour.Thecatgetsa15-minuteheadstart.Inhours,howlongwillittakefortherabbittocatchup?Answer:1---Question:In2004,thereweresomekidsatacookout.In2005,halfthenumberofkidscametothecookoutascomparedto2004.In2006,2/3asmanykidscametothecookoutasin2005.Iftherewere20kidsatthecookoutin2006,howmanykidscametothecookoutin2004?Answer:60---Question:Jamessplits4packsofstickersthathave30stickerseach.Eachstickercost$.10.IfhisfriendpaysforhalfhowmuchdidJamespay?Answer:6---Question:Rookiepoliceofficershavetobuydutyshoesatthefullpriceof$85,butofficerswhohaveservedatleastayeargeta20%discount.Officerswhohaveservedatleastthreeyearsgetanadditional25%offthediscountedprice.Howmuchdoesanofficerwhohasservedatleastthreeyearshavetopayforshoes?Reasoninget'sthinkstepbystepinordertoI'llgeneratetheanswersbasedonthegivenquestions.Heretheyare:---QuestionAmayascored20marksfewerinMathsthanshescoredinArts.Shealsogot10marksmoreinSocialStudiesthanshegotinMusic.Ifshescored70inMusicandscored1/10lessinMaths,what'sthetotalnumberofmarksshescoredinallthesubjects?Answer:296

4.5 自动优化模板

刚刚在增加示例的过程中，我们发现了问题，（1）增加示例也并不能保证大模型预测准确；（2）当大模型调整时原有提示词可能是不适用的，针对于这两个问题，DSPy提供了对提示词和大模型参数进行自动化优化的功能，可以进一步提高模型的准确性以及在不同大模型上的稳定性。用户只需要提供目标领域的训练数据集 Dataset，以及衡量大模型返回结果准确性的衡量标准 Metrics，以及优化器 Optimizer，就可以自动的得到一个最适合目标场景数据集的提示词和模型参数。代码如下：

importdspyfromdspy.datasets.gsm8kimportgsm8k_metricfromdspy.telepromptimportBootstrapFewShot#定义并设置大模型model_name='llama3'lm=dspy.OllamaLocal(model=model_name,timeout_s=1000)dspy.settings.configure(lm=lm)classCoT(dspy.Module):def__init__(self):super().__init__()self.prog=dspy.ChainOfThought("question->answer")defforward(self,question):returnself.prog(question=question)config=dict(max_bootstrapped_demos=4,max_labeled_demos=4)#Optimize!Usethe`gsm8k_metric`here.Ingeneral,themetricisgoingtotelltheoptimizerhowwellit'sdoing.teleprompter=BootstrapFewShot(metric=gsm8k_metric,**config)#可以调整train_set长度optimized_cot=teleprompter.compile(CoT(),trainset=gsm8k_trainset)optimized_cot.save("./test.json")question="Rookiepoliceofficershavetobuydutyshoesatthefullpriceof$85,butofficerswhohaveservedatleastayeargeta20%discount.Officerswhohaveservedatleastthreeyearsgetanadditional25%offthediscountedprice.Howmuchdoesanofficerwhohasservedatleastthreeyearshavetopayforshoes?"response=optimized_cot(question=question)print(f"问题：{question}\n答案：{response.answer}")

上述代码中首先导入了内置的优化器 BootstrapFewShot、度量函数 gsm8k_metric、以及gsm8k数据集的前20条数据即 gsm8k_trainset，利用 BootstrapFewShot.compile 函数进行优化，其中内部保留了对回答问题有利的示例以及大模型参数，最终提问，结果如下：

问题：Rookiepoliceofficershavetobuydutyshoesatthefullpriceof$85,butofficerswhohaveservedatleastayeargeta20%discount.Officerswhohaveservedatleastthreeyearsgetanadditional25%offthediscountedprice.Howmuchdoesanofficerwhohasservedatleastthreeyearshavetopayforshoes?答案：51

最终的提示词如下，可以看出最终在多个训练集中保留了4个示例：

---Question:Rookiepoliceofficershavetobuydutyshoesatthefullpriceof$85,butofficerswhohaveservedatleastayeargeta20%discount.Officerswhohaveservedatleastthreeyearsgetanadditional25%offthediscountedprice.Howmuchdoesanofficerwhohasservedatleastthreeyearshavetopayforshoes?Answer:51---Question:Jamessplits4packsofstickersthathave30stickerseach.Eachstickercost$.10.IfhisfriendpaysforhalfhowmuchdidJamespay?Answer:6---Question:In2004,thereweresomekidsatacookout.In2005,halfthenumberofkidscametothecookoutascomparedto2004.In2006,2/3asmanykidscametothecookoutasin2005.Iftherewere20kidsatthecookoutin2006,howmanykidscametothecookoutin2004?Answer:60---Question:Sarahbuys20pencilsonMonday.Thenshebuys18morepencilsonTuesday.OnWednesdayshebuystriplethenumberofpencilsshedidonTuesday.Howmanypencilsdoesshehave?Answer:92---Question:Rookiepoliceofficershavetobuydutyshoesatthefullpriceof$85,butofficerswhohaveservedatleastayeargeta20%discount.Officerswhohaveservedatleastthreeyearsgetanadditional25%offthediscountedprice.Howmuchdoesanofficerwhohasservedatleastthreeyearshavetopayforshoes?Reasoninget'sthinkstepbystepinordertoQuestion:Rookiepoliceofficershavetobuydutyshoesatthefullpriceof$85,butofficerswhohaveservedatleastayeargeta20%discount.Officerswhohaveservedatleastthreeyearsgetanadditional25%offthediscountedprice.Howmuchdoesanofficerwhohasservedatleastthreeyearshavetopayforshoes?Answer:51

5 总结

DSPy解决了开发LLM应用时由Prompt脆弱性带来的问题。其强调Prompt的整体系统设计，有助于确保系统的文档性和扩展性。通过模块化的设计，DSPy将复杂的任务分解为多个模块，每个模块都有明确的职责和接口，使系统更易于维护和扩展。同时，通过不断的自动优化，DSPy能够不断的调整和优化提示和模型，使得系统不断的朝着更好的方向发展。通过DSPy，开发者只需关注任务本身，而不必纠结于Prompt工程的具体细节，可以极大地提高了开发效率。