返回顶部
热门问答 更多热门问答
技术文章 更多技术文章

丝分缕解!带你了解DSPy核心模块源码实现原理之01篇-Signature类

[复制链接]
链载Ai 显示全部楼层 发表于 昨天 10:40 |阅读模式 打印 上一主题 下一主题


Prompt或许的新未来, DSPy使用从0到1快速上手,以示例驱动的方式,由浅入深的介绍DSPy框架自身的使用流程,并结合知名大模型应用框架LangChain,进一步介绍了两个框架结合的案例。本文将通过示例和源码解析的方式,解读DSPy运行原理。
DSPy是斯坦福 NLP 组推出的一个用于优化和生成Prompt的框架,DSPy将传统手工编写提示词的方式抽象为高代码编程方式,其核心思路为:

(1)将整体流程与单步骤分开。每个步骤聚焦具体的工作,协同完成Prompt的优化

(2)引入多样的优化器,这些优化器是由L驱动的算法,可以根据定义的阈值来调整调用LM的提示及访问参数。

上述步骤主要由 Signature、Module、Optimizer三个核心模块实现。

DSPy 实现原理之Signature类

DSPy的Signature类,是DSPy核心模块之一,Signature是声明性的规范,定义了 DSPy 模块的输入/输出行为,用于告诉语言模型应执行哪些任务,而不是我们应如何设置 prompt 语言模型。
一个签名包括三个基本元素:
  • 语言模型旨在解决的子任务的简洁描述。

  • 我们提供给语言模型的一个或多个输入字段的描述(例如,输入问题)。

  • 我们期望从语言模型得到的一个或多个输出字段的描述(例如,问题的答案)。
Signature的基本用法示例
首先来看一个示例来解释Signature的用法,需要准备签名所需的三要素,下面代码进行三步操作:

1、将子任务描述填入函数下面的注释中。

2、定义输入字段(必选),并对输入字段设置描述(可选)。

3、定义输出字段(必选),并对输出字段设置描述(可选)。
classQA(dspy.Signature):"""answerthequestionofuser"""
user_question=dspy.InputField(desc="用户的问题")answer=dspy.OutputField()
然后我们使用dspy.ChainOfThought 进行一次推理,并查看提示词最终的内容
question="whatisthecolorofthesea?"summarize=dspy.ChainOfThought(QA)response=summarize(question=question)#查看提示词lm.inspect_history(n=1)
提示词的内容如下,通过提示词内容,可以看出Signature可以将类定义中的注释内容,转换为对这个子任务的描述填写到提示词开头部分,然后将输入输出字段,分别以统一的格式(首字母大写,单词用空格分开)排布在 Reasoning的前后,其中Reasoning的内容为 CoT模式的固定提示词。
answerthequestionofuser
---
Followthefollowingformat.
UserQuestion:用户的问题Reasoninget'sthinkstepbystepinorderto${producetheanswer}.We...Answer{answer}
---
Question:what isthecolorofthesea?Reasoninget'sthinkstepbystepinordertoQuestion:whatisthecolorofthesky?Reasoninget'sthinkstepbystepinordertodeterminethecolorofthesky.Theskyappearsblueduetothescatteringoflightwavesintheatmosphere.Thebluelightisscatteredmoreefficientlythanothercolorsoflight,whichiswhytheskyappearsblue.Answer:Blue
通过上面的例子,我们了解了如何通过写代码的方式,产生提示词,并使得提示词包括我们对任务描述和输入输出参数描述。
此外,Signature还可以通过字符串定义输入输出方式,代替通过继承方式定义,代码如下:
summarize=dspy.ChainOfThought('question->answer')
但是这种方式只能定义输入输出的字段名,无法定义任务描述 和 字段描述,此时产生的提示词的任务描述是默认任务描述,提示词如下:
Giventhefields`question`,producethefields`answer`.
---
Followthefollowingformat.
Question{question}Reasoninget'sthinkstepbystepinorderto${producetheanswer}.We...Answer{answer}
---
至此,我们介绍了Signature的两种用法。

Signature源码解析

那么,Signature类是如何实现上述功能的呢?
我们将通过源码解释以下问题:
  • Signature执行流程

  • 字符串如何转换为Signature类

  • Signature的变量如何转换为提示词中的内容
我们深入Signature类的代码中进行简单的解释(文件位置:dspy\signatures):
# signature.py 部分删减import astimport reimport typesimport typingfrom copy import deepcopyfrom typing import Any, Dict, Tuple, Type, Union # noqa: UP035
from pydantic import BaseModel, Field, create_modelfrom pydantic.fields import FieldInfo
import dspfrom dspy.signatures.field import InputField, OutputField, new_to_old_field

class SignatureMeta(type(BaseModel)): # Signature元类 def __call__(cls, *args, **kwargs): # noqa: ANN002 if cls is Signature: return make_signature(*args, **kwargs) return super().__call__(*args, **kwargs)
def __new__(mcs, signature_name, bases, namespace, **kwargs): # noqa: N804 # 初始化Signature时调用该函数 # Set `str` as the default type for all fields raw_annotations = namespace.get("__annotations__", {}) for name, field in namespace.items(): if not isinstance(field, FieldInfo): continue # Don't add types to non-field attributes if not name.startswith("__") and name not in raw_annotations: raw_annotations[name] = str namespace["__annotations__"] = raw_annotations
# Let Pydantic do its thing cls = super().__new__(mcs, signature_name, bases, namespace, **kwargs)
# If we don't have instructions, it might be because we are a derived generic type. # In that case, we should inherit the instructions from the base class. if cls.__doc__ is None: for base in bases: if isinstance(base, SignatureMeta): doc = getattr(base, "__doc__", "") if doc != "": cls.__doc__ = doc
# The more likely case is that the user has just not given us a type. # In that case, we should default to the input/output format. if cls.__doc__ is None: cls.__doc__ = _default_instructions(cls)
# Ensure all fields are declared with InputField or OutputField cls._validate_fields()
# Ensure all fields have a prefix for name, field in cls.model_fields.items(): if "prefix" not in field.json_schema_extra: field.json_schema_extra["prefix"] = infer_prefix(name) + ":" if "desc" not in field.json_schema_extra: field.json_schema_extra["desc"] = f"${{{name}}}"
return cls
...

class Signature(BaseModel, metaclass=SignatureMeta): "" # noqa: D419
# Note: Don't put a docstring here, as it will become the default instructions # for any signature that doesn't define it's own instructions. pass
def make_signature( # 根据给定的参数创建Signature 实例 signature: Union[str, Dict[str, Tuple[type, FieldInfo]]], instructions: str = None, signature_name: str = "StringSignature",) -> Type[Signature]: """Create a new Signature type with the given fields and instructions.
Note: Even though we're calling a type, we're not making an instance of the type. In general, instances of Signature types are not allowed to be made. The call syntax is provided for convenience.
Args: signature: The signature format, specified as "input1, input2 -> output1, output2". instructions: An optional prompt for the signature. signature_name: An optional name for the new signature type. """ fields = _parse_signature(signature) if isinstance(signature, str) else signature
# Validate the fields, this is important because we sometimes forget the # slightly unintuitive syntax with tuples of (type, Field) fixed_fields = {} for name, type_field in fields.items(): if not isinstance(name, str): raise ValueError(f"Field names must be strings, not {type(name)}") if isinstance(type_field, FieldInfo): type_ = type_field.annotation field = type_field else: if not isinstance(type_field, tuple): raise ValueError(f"Field values must be tuples, not {type(type_field)}") type_, field = type_field # It might be better to be explicit about the type, but it currently would break # program of thought and teleprompters, so we just silently default to string. if type_ is None: type_ = str # if not isinstance(type_, type) and not isinstance(typing.get_origin(type_), type): if not isinstance(type_, (type, typing._GenericAlias, types.GenericAlias)): raise ValueError(f"Field types must be types, not {type(type_)}") if not isinstance(field, FieldInfo): raise ValueError(f"Field values must be Field instances, not {type(field)}") fixed_fields[name] = (type_, field)
# Fixing the fields shouldn't change the order assert list(fixed_fields.keys()) == list(fields.keys()) # noqa: S101
# Default prompt when no instructions are provided if instructions is None: sig = Signature(signature, "") # Simple way to parse input/output fields instructions = _default_instructions(sig)
return create_model( signature_name, __base__=Signature, __doc__=instructions, **fixed_fields, )
def _parse_signature(signature: str) -> Tuple[Type, Field]: # 将字符串形式的输入输出转为对象 if signature.count("->") != 1: raise ValueError(f"Invalid signature format: '{signature}', must contain exactly one '->'.")
fields = {} inputs_str, outputs_str = map(str.strip, signature.split("->")) inputs = [v.strip() for v in inputs_str.split(",") if v.strip()] outputs = [v.strip() for v in outputs_str.split(",") if v.strip()] for name_type in inputs: name, type_ = _parse_named_type_node(name_type) fields[name] = (type_, InputField()) for name_type in outputs: name, type_ = _parse_named_type_node(name_type) fields[name] = (type_, OutputField())
return fields
def infer_prefix(attribute_name: str) -> str: # 同意在提示词中的格式,如首字母大写等 """Infer a prefix from an attribute name.""" # Convert camelCase to snake_case, but handle sequences of capital letters properly s1 = re.sub("(.)([A-Z][a-z]+)", r"\1_\2", attribute_name) intermediate_name = re.sub("([a-z0-9])([A-Z])", r"\1_\2", s1)
# Insert underscores around numbers to ensure spaces in the final output with_underscores_around_numbers = re.sub( r"([a-zA-Z])(\d)", r"\1_\2", intermediate_name, ) with_underscores_around_numbers = re.sub( r"(\d)([a-zA-Z])", r"\1_\2", with_underscores_around_numbers, )
# Convert snake_case to 'roper Title Case', but ensure acronyms are uppercased words = with_underscores_around_numbers.split("_") title_cased_words = [] for word in words: if word.isupper(): title_cased_words.append(word) else: title_cased_words.append(word.capitalize())
return " ".join(title_cased_words)
1、执行流程
从代码中代码中可以看出,Signature类集成了pydantic.BaseModel,并设置了元类 SignatureMeta;pydantic.BaseModel是格式校验的类;SignatureMeta是自定义类,Signature的大部分提示词逻辑都来自SignatureMeta类。
在Signature类及子类初始化时,首先会调用 SignatureMeta的__call__ 函数,__call__中调用了 make_signature函数,该函数主要是解析输出的字段,最终调用pydantic.create_model函数创建pydantic的格式类(__call__ -> make_signature -> pydantic.create_model),格式类中已经包含了输入输出字段以及对应的字段描述。
然后调用SignatureMeta的 __new__函数,该函数将子任务描述,传入到 cls.__doc__变量中,并且利用infer_prefix函数,修改了输入输出字段的格式,使他们统一为首字母大写的形式,存入到 field.json_schema_extra["prefix"] 中,将字段的描述,存入 field.json_schema_extra["desc"] 字段中,并返回类。
2、字符串如何转为Signature类
如果输入为字符串类型,则 make_signature中的_parse_signature 函数会 格式化这个字符串并转换为 dspy.InputFiled 或 dspy.OutputField类型,至此,通过字符串和Signature类定义的方式都统一成了相同类型。
3、Signature的变量如何转换为提示词中的内容

在初始化时,调用了SignatureMeta类的__call__ 函数和 __new__函数,两个函数创建了pydantic.BaseModal类,并将输入输出字段进行格式化,将任务描述存入 cls.__doc__,至此,所有代码描述的注释和变量,都转变为了提示词中将要用到的字符串内容,在 dspy.Module执行时,则会对提示词进行填充。


回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

链载AI是专业的生成式人工智能教程平台。提供Stable Diffusion、Midjourney AI绘画教程,Suno AI音乐生成指南,以及Runway、Pika等AI视频制作与动画生成实战案例。从提示词编写到参数调整,手把手助您从入门到精通。
  • 官方手机版

  • 微信公众号

  • 商务合作

  • Powered by Discuz! X3.5 | Copyright © 2025-2025. | 链载Ai
  • 桂ICP备2024021734号 | 营业执照 | |广西笔趣文化传媒有限公司|| QQ