Reasoning模型蒸馏实践：用大模型提升小模型能力

显示全部楼层

01

前言

DeepSeek-R1的爆火让更多开发者注意到模型蒸馏技术——这种让小模型也能"开小灶"习得大模型知识精华的秘诀。今天我们就用Qwen2.5-1.5B小模型（相当于AI界的初中生）来进行实践！

? 什么是模型蒸馏？

就像普通学生跟着学霸学解题思路：

-教师模型 = 学霸本霸（比如DeepSeek-R1）

-学生模型 = 需要进步的Qwen2.5-1.5B

-蒸馏数据 = 学霸的解题笔记

? 三步速成法：

制造"学霸笔记"（构造蒸馏数据）

-让学霸模型处理大量题目

-记录它的解题过程和参考答案

-整理成适合小模型学习的训练集

特训小模型（训练阶段）

-重点模仿学霸的解题思路

考试验收（模型评测）

-准备数学题等测试卷

-对比特训前后的考试成绩

-观察逻辑推理能力的提升效果

跟着这个流程，小模型也能获得学霸的真传！不需要昂贵硬件，用常规显卡就能训练，赶紧试试这个AI界的"开小灶"秘籍吧~

02

构造蒸馏数据

为了让小模型也有合适的学习资料，我们需要从高质量开源数学数据集，例如AI-MO/NuminaMath-CoT中获取蒸馏数据

AI-MO/NuminaMath-CoT：
https://www.modelscope.cn/datasets/AI-MO/NuminaMath-CoT/summary

下面展示使用ModelScope的在线模型推理服务（https://www.modelscope.cn/docs/model-service/API-Inference/intro），用DeepSeek-R1作为教师模型，通过prompt构造的方式获取一个数学题的解题过程和参考答案。以下是Python代码示例：

fromopenaiimportOpenAIimportossystem_prompt=('AconversationbetweenUserandAssistant.Theuserasksaquestion,andtheAssistantsolvesit.''Theassistantfirstthinksaboutthereasoningprocessinthemindandthenprovidestheuser''withtheanswer.Thereasoningprocessandanswerareenclosed''within<think></think>and<answer></answer>tags,respectively,''i.e.,<think>reasoningprocesshere</think><answer>answerhere</answer>.')prompt_template=r'{question}\nPleasereasonstepbystep,andputyourfinalanswerwithin\boxed{{}}.'question='Findallrealnumbers\(x,y,z\)suchthat\[x+y+z=3,\quadx^2+y^2+z^2=3,\quadx^3+y^3+z^3=3\]'client=OpenAI(api_key=os.getenv("MODELSCOPE_SDK_TOKEN"),#请替换成您的ModelScopeSDKTokenbase_url="https://api-inference.modelscope.cn/v1/")response=client.chat.completions.create(model="deepseek-ai/DeepSeek-R1",#ModleScopeModel-Idmessages=[{'role':'system','content':system_prompt},{'role':'user','content':prompt_template.format(question=question)}],stream=True)forchunkinresponse:ifhasattr(chunk.choices[0].delta,'reasoning_content'):print(chunk.choices[0].delta.reasoning_content,end='',flush=True)ifhasattr(chunk.choices[0].delta,'content'):print(chunk.choices[0].delta.content,end='',flush=True)

输出：

Okay,soIneedtosolvethissystemofequations:x+y+z=3,x²+y²+z²=3,x³+y³+z³=3.Hmm,let'ssee.Thesearesymmetricequations,somaybeIcanusesomesymmetricpropertiesorformulas.Letmerecallthatforsymmetricequations,usingvariableslikesumsandproductsmighthelp.LetmedenoteS=x+y+z,whichisgivenas3.Then,thesumofsquaresisgivenbyx²+y²+z²=3.IalsoknowthatthesumofsquarescanberelatedtoSandthepairwiseproducts.Theformulais:x²+y²+z²=(x+y+z)²-2(xy+yz+zx).Pluggingintheknownvalues,3=(3)²-2(xy+yz+zx).So,3=9-2(xy+yz+zx).Let'ssolveforthesumofproducts:2(xy+yz+zx)=9-3=6⇒xy+yz+zx=3.Okay,sonowIhaveS=3,andthesumofproductsisalso3.Now,movingontothethirdequation,whichisthesumofcubes:x³+y³+z³=3.There'saformulaforthesumofcubesintermsofS,thesumofproducts,andtheproductxyz.Theformulais:x³+y³+z³=(x+y+z)³-3(x+y+z)(xy+yz+zx)+3xyz.Pluggingintheknownvalues:3=3³-3*3*3+3xyz.Calculatingeachterm:3³is27,3*3*3is27,So,3=27-27+3xyz⇒3=0+3xyz⇒3xyz=3⇒xyz=1.SonowIhaveS=3,sumofproducts=3,andproduct=1.Thesearetheelementarysymmetricsums.Therefore,themoniccubicpolynomialwhoserootsarex,y,zis:t³-St²+(xy+yz+zx)t-xyz=0⇒t³-3t²+3t-1=0.So,solvingthiscubicequationshouldgivemeallrealnumbersx,y,z.Letmetrytofactorthiscubic.Maybeithasrationalroots?ByRationalRootTheorem,possiblerootsare±1.Let'stestt=1:1³-3*1²+3*1-1=1-3+3-1=0.Yes,t=1isaroot.So,wecanfactorout(t-1).Let'sperformpolynomialdivisionorusesyntheticdivision.Dividingt³-3t²+3t-1by(t-1):Usingsyntheticdivision:1|1-33-11-211-210So,thequadraticfactorist²-2t+1.Whichfactorsfurtheras(t-1)².Therefore,thecubicfactorsas(t-1)³=0.So,allthreerootsare1.Therefore,thesolutionisx=y=z=1.Wait,butletmecheckiftherecouldbeothersolutions.Supposethecubicfactorsinto(t-1)^3,whichmeansthatallthreevariablesare1.Letmeverifyifthissatisfiestheoriginalequations.Firstequation:1+1+1=3,yes.Second:1²+1²+1²=3,yes.Third:1³+1³+1³=3,yes.Sothatworks.Butwait,maybethereareotherrealsolutionswherevariablesarenotall1?Let'ssee.Sincethecubicequationonlyhast=1asatripleroot,that'stheonlysolution.Therefore,theonlyrealsolutionisx=y=z=1.Butletmethinkagain.Supposethattwovariablesareequalto1andthethirdisdifferent.Letmetestthat.Supposex=1,y=1,thenz=3-1-1=1.So,z=1.Sothat'sstillallones.Alternatively,ifonevariableisnot1,wouldthatwork?Wait,supposethatx,y,zareall1,whichworks.Suppose,forexample,x=2,y=0,z=1.Thenx+y+z=3.Checkthesumofsquares:4+0+1=5≠3.Sothatdoesn'twork.Alternatively,maybesomeothercombination.Butgiventhatthecubicequationonlyhas1asaroot,multiplicitythree,thatsuggeststhatallthreevariablesmustbe1.Alternatively,perhapscomplexroots,buttheproblemasksforrealnumbers.So,sincetheonlyrealrootis1withmultiplicitythree,theonlyrealsolutionisx=y=z=1.Therefore,theanswerisallvariablesequalto1.<answer>Theonlyrealsolutionis\(x=y=z=1\).Thus,thenumbersare\(\boxed{1}\),\(\boxed{1}\),and\(\boxed{1}\).Alternatively,sinceallvariablesarethesame,itcanbewrittenas\(\boxed{1}\)foreachvariable.**FinalAnswer**Therealnumbersare\(\boxed{1}\),\(\boxed{1}\),and\(\boxed{1}\).</answer>

03

训练模型

我们已经处理了一批数学题，生成了一个包含4000个样本的蒸馏数据集。每个样本包含题目、解题过程和参考答案。我们将其保存为JSONL格式，方便后续使用，数据集预览：https://www.modelscope.cn/datasets/modelscope/MathR/dataPeview。

接下来，我们将使用ms-swift（https://github.com/modelscope/ms-swift）模型训练框架，用这批数据对Qwen2.5-1.5B模型进行训练。

训练样本示例：

[{"role":"user","content":"Asetofconsecutivepositiveintegersbeginningwith$1$iswrittenonablackboard.Onenumberiserased.Theaverage(arithmeticmean)oftheremainingnumbersis$35\\frac{7}{17}$.Whatnumberwaserased?\n$\\textbf{(A)}\\6\\qquad\\textbf{(B)}\\7\\qquad\\textbf{(C)}\\8\\qquad\\textbf{(D)}\\9\\qquad\\textbf{(E)}\\\\text{cannotbedetermined}$\nPleasereasonstepbystep,andputyourfinalanswerwithin\boxed{}."},{"role":"assistant","content":"\nOkay,let'ssee.Ineedtofigureoutwhich.......AnswerisB.\n\n**FinalAnswer**\n\\boxed{B}\n\n\nGivenasetofconsecutivepositiveintegersstarting......Averageofremainingnumbers:\\(\\frac{2408}{68}=\\frac{602}{17}=35\\frac{7}{17}\\)\n\nThus,thenumbererasedis\\(\\boxed{B}\\)."}]

注意：受限于显存，我们使用LoRA技术对Qwen2.5-1.5B进行微调。LoRA是一种高效的模型微调方法，能够在不改变原始模型参数的情况下，通过添加低秩矩阵来实现模型的适应性调整。这样可以大幅降低训练成本和时间。如果有更强的显卡，可以考虑使用更多的训练数据以及全量参数微调。

下面的命令中我们还使用了Swanlab（https://github.com/SwanHubX/SwanLab）进行训练过程的可视化，可以方便的查看训练过程中loss等指标的变化情况，请替换下面的YOUR_SWANLAB_TOKEN

!CUDA_VISIBLE_DEVICES=0\swiftsft\--modelQwen/Qwen2.5-1.5B-Instruct\--train_typelora\--lora_rank16\--torch_dtypebfloat16\--dataset'modelscope/MathR:clean'\--split_dataset_ratio0\--max_length4096\--num_train_epochs1\--per_device_train_batch_size1\--learning_rate1e-5\--gradient_accumulation_steps16\--save_steps100\--save_total_limit10\--logging_steps5\--report_toswanlab\--swanlab_tokenYOUR_SWANLAB_TOKEN\--swanlab_modecloud

在控制台运行下面的命令，可以与训练后的模型进行对话，了解模型效果：

注意：把adapters参数替换成你训练好的模型路径，--stream参数设置为true表示使用流式推理，--infer_backend参数设置为pt表示使用PyTorch作为推理后端，--temperature参数设置为0表示不引入随机性，--max_new_tokens参数设置为2048表示生成的最大token数。

swiftinfer\--adapters'output/Qwen2.5-1.5B-Instruct/v11-20250415-120200/checkpoint-81'\--streamtrue\--infer_backendpt\--temperature0\--max_new_tokens2048

04

模型性能前后对比

在训练完成后，我们使用一组新的数学题对模型进行评测。这里我们使用gsm8k数据集（数学题数据集）来进行评测，可以在这里查看数据集（https://www.modelscope.cn/datasets/modelscope/gsm8k/dataPeview）

fromevalscopeimportrun_task,TaskConfigtask_config=TaskConfig(model="Qwen/Qwen2.5-1.5B-Instruct",#原始模型datasets=["gsm8k"],#数据集名称dataset_args={"gsm8k":{"few_shot_num":0},#few_shot_num:0表示不使用few-shot},generation_config={"max_new_tokens":4096,#生成的最大token数"temperature":0,#生成的温度系数，0表示贪婪搜索},eval_batch_size=10,#评测时的batchsizelimit=100#评测数据集的大小，抽取前100条数据进行评测)run_task(task_config)

结果如下：

为了评测训练之后的模型，需要运行下面的命令将我们训练的lora参数合并回原始模型，得到一个新的模型Qwen2.5-1.5B-Instruct，并将其保存到checkpoint-xxx-merged目录下。

!swiftexport\--adapters/mnt/data/data/user/maoyunlin.myl/tools/course/distill/output/Qwen2.5-1.5B-Instruct/v11-20250415-120200/checkpoint-81\--merge_loratrue

#测试蒸馏训练后的模型fromevalscopeimportrun_task,TaskConfig#记得替换下面的model路径task_config=TaskConfig(model="/mnt/data/data/user/maoyunlin.myl/tools/course/distill/output/Qwen2.5-1.5B-Instruct/v11-20250415-120200/checkpoint-81-merged",datasets=["gsm8k"],dataset_args={"gsm8k":{"few_shot_num":0},},generation_config={"max_new_tokens":4096,"temperature":0,},eval_batch_size=10,limit=100)run_task(task_config)

结果如下：

可视化结果

通过训练结果可以看到模型的回答准确率提升了12%，进步还是很可观的。我们还可以使用可视化工具来进一步分析模型的推理过程，帮助我们更好地理解模型的决策逻辑。

importosos.environ['GRADIO_ROOT_PATH']=f"/{os.environ['JUPYTER_NAME']}/proxy/7860"print(os.environ['GRADIO_ROOT_PATH'])

!evalscopeapp

05

总结

在这个教程中，我们详细演示了如何利用一个教师模型来蒸馏一个小模型的完整流程。内容涵盖三个关键环节：数据构造、模型训练和模型评测。通过这一方法，您可以高效地训练出属于自己的小模型。希望本教程能帮助您掌握这一技术，并灵活运用于实际项目中！