|
Prompt或许的新未来, DSPy使用从0到1快速上手,以示例驱动的方式,由浅入深的介绍DSPy框架自身的使用流程,并结合知名大模型应用框架LangChain,进一步介绍了两个框架结合的案例。本文将通过示例和源码解析的方式,解读DSPy运行原理。DSPy是斯坦福 NLP 组推出的一个用于优化和生成Prompt的框架,DSPy将传统手工编写提示词的方式抽象为高代码编程方式,其核心思路为:(1)将整体流程与单步骤分开。每个步骤聚焦具体的工作,协同完成Prompt的优化 (2)引入多样的优化器,这些优化器是由L驱动的算法,可以根据定义的阈值来调整调用LM的提示及访问参数。上述步骤主要由 Signature、Module、Optimizer三个核心模块实现。 DSPy的Signature类,是DSPy核心模块之一,Signature是声明性的规范,定义了 DSPy 模块的输入/输出行为,用于告诉语言模型应执行哪些任务,而不是我们应如何设置 prompt 语言模型。 Signature的基本用法示例首先来看一个示例来解释Signature的用法,需要准备签名所需的三要素,下面代码进行三步操作:1、将子任务描述填入函数下面的注释中。 2、定义输入字段(必选),并对输入字段设置描述(可选)。 3、定义输出字段(必选),并对输出字段设置描述(可选)。classQA(dspy.Signature):"""answerthequestionofuser"""
user_question=dspy.InputField(desc="用户的问题")answer=dspy.OutputField() 然后我们使用dspy.ChainOfThought 进行一次推理,并查看提示词最终的内容question="whatisthecolorofthesea?"summarize=dspy.ChainOfThought(QA)response=summarize(question=question)#查看提示词lm.inspect_history(n=1) 提示词的内容如下,通过提示词内容,可以看出Signature可以将类定义中的注释内容,转换为对这个子任务的描述填写到提示词开头部分,然后将输入输出字段,分别以统一的格式(首字母大写,单词用空格分开)排布在 Reasoning的前后,其中Reasoning的内容为 CoT模式的固定提示词。 answerthequestionofuser
---
Followthefollowingformat.
UserQuestion:用户的问题Reasoning et'sthinkstepbystepinorderto${producetheanswer}.We...Answer {answer}
---
Question:what isthecolorofthesea?Reasoning et'sthinkstepbystepinordertoQuestion:whatisthecolorofthesky?Reasoning et'sthinkstepbystepinordertodeterminethecolorofthesky.Theskyappearsblueduetothescatteringoflightwavesintheatmosphere.Thebluelightisscatteredmoreefficientlythanothercolorsoflight,whichiswhytheskyappearsblue.Answer:Blue 通过上面的例子,我们了解了如何通过写代码的方式,产生提示词,并使得提示词包括我们对任务描述和输入输出参数描述。此外,Signature还可以通过字符串定义输入输出方式,代替通过继承方式定义,代码如下:summarize=dspy.ChainOfThought('question->answer')但是这种方式只能定义输入输出的字段名,无法定义任务描述 和 字段描述,此时产生的提示词的任务描述是默认任务描述,提示词如下: Giventhefields`question`,producethefields`answer`.
---
Followthefollowingformat.
Question {question}Reasoning et'sthinkstepbystepinorderto${producetheanswer}.We...Answer {answer}
---
Signature源码解析那么,Signature类是如何实现上述功能的呢?Signature执行流程 字符串如何转换为Signature类 我们深入Signature类的代码中进行简单的解释(文件位置:dspy\signatures):# signature.py 部分删减import astimport reimport typesimport typingfrom copy import deepcopyfrom typing import Any, Dict, Tuple, Type, Union # noqa: UP035
from pydantic import BaseModel, Field, create_modelfrom pydantic.fields import FieldInfo
import dspfrom dspy.signatures.field import InputField, OutputField, new_to_old_field
class SignatureMeta(type(BaseModel)): # Signature元类 def __call__(cls, *args, **kwargs): # noqa: ANN002 if cls is Signature: return make_signature(*args, **kwargs) return super().__call__(*args, **kwargs)
def __new__(mcs, signature_name, bases, namespace, **kwargs): # noqa: N804 # 初始化Signature时调用该函数 # Set `str` as the default type for all fields raw_annotations = namespace.get("__annotations__", {}) for name, field in namespace.items(): if not isinstance(field, FieldInfo): continue # Don't add types to non-field attributes if not name.startswith("__") and name not in raw_annotations: raw_annotations[name] = str namespace["__annotations__"] = raw_annotations
# Let Pydantic do its thing cls = super().__new__(mcs, signature_name, bases, namespace, **kwargs)
# If we don't have instructions, it might be because we are a derived generic type. # In that case, we should inherit the instructions from the base class. if cls.__doc__ is None: for base in bases: if isinstance(base, SignatureMeta): doc = getattr(base, "__doc__", "") if doc != "": cls.__doc__ = doc
# The more likely case is that the user has just not given us a type. # In that case, we should default to the input/output format. if cls.__doc__ is None: cls.__doc__ = _default_instructions(cls)
# Ensure all fields are declared with InputField or OutputField cls._validate_fields()
# Ensure all fields have a prefix for name, field in cls.model_fields.items(): if "prefix" not in field.json_schema_extra: field.json_schema_extra["prefix"] = infer_prefix(name) + ":" if "desc" not in field.json_schema_extra: field.json_schema_extra["desc"] = f"${{{name}}}"
return cls
...
class Signature(BaseModel, metaclass=SignatureMeta): "" # noqa: D419
# Note: Don't put a docstring here, as it will become the default instructions # for any signature that doesn't define it's own instructions. pass
def make_signature( # 根据给定的参数创建Signature 实例 signature: Union[str, Dict[str, Tuple[type, FieldInfo]]], instructions: str = None, signature_name: str = "StringSignature",) -> Type[Signature]: """Create a new Signature type with the given fields and instructions.
Note: Even though we're calling a type, we're not making an instance of the type. In general, instances of Signature types are not allowed to be made. The call syntax is provided for convenience.
Args: signature: The signature format, specified as "input1, input2 -> output1, output2". instructions: An optional prompt for the signature. signature_name: An optional name for the new signature type. """ fields = _parse_signature(signature) if isinstance(signature, str) else signature
# Validate the fields, this is important because we sometimes forget the # slightly unintuitive syntax with tuples of (type, Field) fixed_fields = {} for name, type_field in fields.items(): if not isinstance(name, str): raise ValueError(f"Field names must be strings, not {type(name)}") if isinstance(type_field, FieldInfo): type_ = type_field.annotation field = type_field else: if not isinstance(type_field, tuple): raise ValueError(f"Field values must be tuples, not {type(type_field)}") type_, field = type_field # It might be better to be explicit about the type, but it currently would break # program of thought and teleprompters, so we just silently default to string. if type_ is None: type_ = str # if not isinstance(type_, type) and not isinstance(typing.get_origin(type_), type): if not isinstance(type_, (type, typing._GenericAlias, types.GenericAlias)): raise ValueError(f"Field types must be types, not {type(type_)}") if not isinstance(field, FieldInfo): raise ValueError(f"Field values must be Field instances, not {type(field)}") fixed_fields[name] = (type_, field)
# Fixing the fields shouldn't change the order assert list(fixed_fields.keys()) == list(fields.keys()) # noqa: S101
# Default prompt when no instructions are provided if instructions is None: sig = Signature(signature, "") # Simple way to parse input/output fields instructions = _default_instructions(sig)
return create_model( signature_name, __base__=Signature, __doc__=instructions, **fixed_fields, )
def _parse_signature(signature: str) -> Tuple[Type, Field]: # 将字符串形式的输入输出转为对象 if signature.count("->") != 1: raise ValueError(f"Invalid signature format: '{signature}', must contain exactly one '->'.")
fields = {} inputs_str, outputs_str = map(str.strip, signature.split("->")) inputs = [v.strip() for v in inputs_str.split(",") if v.strip()] outputs = [v.strip() for v in outputs_str.split(",") if v.strip()] for name_type in inputs: name, type_ = _parse_named_type_node(name_type) fields[name] = (type_, InputField()) for name_type in outputs: name, type_ = _parse_named_type_node(name_type) fields[name] = (type_, OutputField())
return fields
def infer_prefix(attribute_name: str) -> str: # 同意在提示词中的格式,如首字母大写等 """Infer a prefix from an attribute name.""" # Convert camelCase to snake_case, but handle sequences of capital letters properly s1 = re.sub("(.)([A-Z][a-z]+)", r"\1_\2", attribute_name) intermediate_name = re.sub("([a-z0-9])([A-Z])", r"\1_\2", s1)
# Insert underscores around numbers to ensure spaces in the final output with_underscores_around_numbers = re.sub( r"([a-zA-Z])(\d)", r"\1_\2", intermediate_name, ) with_underscores_around_numbers = re.sub( r"(\d)([a-zA-Z])", r"\1_\2", with_underscores_around_numbers, )
# Convert snake_case to ' roper Title Case', but ensure acronyms are uppercased words = with_underscores_around_numbers.split("_") title_cased_words = [] for word in words: if word.isupper(): title_cased_words.append(word) else: title_cased_words.append(word.capitalize())
return " ".join(title_cased_words) 从代码中代码中可以看出,Signature类集成了pydantic.BaseModel,并设置了元类 SignatureMeta;pydantic.BaseModel是格式校验的类;SignatureMeta是自定义类,Signature的大部分提示词逻辑都来自SignatureMeta类。 在Signature类及子类初始化时,首先会调用 SignatureMeta的__call__ 函数,__call__中调用了 make_signature函数,该函数主要是解析输出的字段,最终调用pydantic.create_model函数创建pydantic的格式类(__call__ -> make_signature -> pydantic.create_model),格式类中已经包含了输入输出字段以及对应的字段描述。然后调用SignatureMeta的 __new__函数,该函数将子任务描述,传入到 cls.__doc__变量中,并且利用infer_prefix函数,修改了输入输出字段的格式,使他们统一为首字母大写的形式,存入到 field.json_schema_extra["prefix"] 中,将字段的描述,存入 field.json_schema_extra["desc"] 字段中,并返回类。如果输入为字符串类型,则 make_signature中的_parse_signature 函数会 格式化这个字符串并转换为 dspy.InputFiled 或 dspy.OutputField类型,至此,通过字符串和Signature类定义的方式都统一成了相同类型。3、Signature的变量如何转换为提示词中的内容在初始化时,调用了SignatureMeta类的__call__ 函数和 __new__函数,两个函数创建了pydantic.BaseModal类,并将输入输出字段进行格式化,将任务描述存入 cls.__doc__,至此,所有代码描述的注释和变量,都转变为了提示词中将要用到的字符串内容,在 dspy.Module执行时,则会对提示词进行填充。
|