笔记本部署大模型指南: 以Qwen为例

显示全部楼层

1.基础环境说明

使用Windows 11系统，命令行工具是Git Bash。笔记本是4G显存的英伟达 3050， CUDA版本如下：

Copyright(c)2005-2024NVIDIACorporationBuiltonThu_Sep_12_02:55:00_Pacific_Daylight_Time_2024Cudacompilationtools,release12.6,V12.6.77Buildcuda_12.6.r12.6/compiler.34841621_0

2.使用Conda进行环境配置

conda很好的为我们提供了独立的Python环境的能力。Windows11版本可以通过以下链接下载conda：

https://repo.anaconda.com/archive/Anaconda3-2024.10-1-Windows-x86_64.exe

注意在安装的时候，记得将conda加到环境变量的选型勾上：

3.conda环境配置

考虑到qwen模型在开源领域的表现很好，同时它还提供了0.5B大小的模型。所以选择用qwen-0.5B作为基础模型。

使用如下命令新建环境：

condacreate-nqwenpython=3.12

一次性安装如下依赖：

pipinstallpython-multipartpipinstalluvicornpipinstallfastapipipinstalltransformerspipinstalltorchpipinstall'accelerate>=0.26.0'

3.1错误CondaError: Run 'conda init' before 'conda activate'处理

实际运行的时候，可能遇到如下错误：

CondaError:Run'condainit'before'condaactivate'

实际上是因为已经进入了一个环境，没有deactivate的话就会出现这个问题。默认情况下conda在base环境，所以通过执行如下两个命令即可：

sourceactivatecondadeactivate

解决该问题。正常，如果已经工作在qwen环境的话，每次执行完命令后会有个环境名的提示，如下：

$lsmain.pymain_test.pymodel/test.py(qwen)

3.2GPU版本

如果要用GPU版本，可以创建一个名为qwen-gpu的环境，然后给环境安装如下依赖：

condainstallpytorchtorchvisiontorchaudiopytorch-cuda=12.4-cpytorch-cnvidia

前提是已经安装好了显卡的驱动和cuda。我的cuda是12.6，因此执行上面命令没有问题。

可以通过以下代码确定GPU是否可以正常支持：

importtorch;device=torch.device('cuda:0')print(torch.cuda.is_available())if__name__=="__main__":print(torch.cuda.is_available())

如果是True，则表示支持。然后继续和非GPU版本一样安装依赖即可

4.手动下载模型

因为一些原因，国内不能直接去 https://huggingface.co 上下载模型。

幸好有个hg的镜像站可以下载。因此我们可以用手动的方式下载模型即可。镜像站地址：https://hf-mirror.com/

下载依赖

pipinstall-Uhuggingface_hub

设置环境变量

可以考虑设置到bashrc中，不然每次记得执行导出

exportHF_ENDPOINT=https://hf-mirror.com

模型下载

huggingface-clidownload--resume-downloadQwen/Qwen2.5-0.5B-Instruct--local-dirQwen2.5-0.5B-Instruct

第三个参数是模型名字。模型名字从镜像网站上即可得到，比如：

https://hf-mirror.com/Qwen/Qwen2.5-0.5B-Instruct 名字从如下地方复制即可：

5.部署模型

用如下代码进行模型的部署：

fromfastapiimportFastAPI,HTTPExceptionfrompydanticimportBaseModelfromtransformersimportAutoModelForCausalLM,AutoTokenizerimporttorchfromtypingimportList#fastapi应用app=FastAPI()#请求体结构classMessage(BaseModel):role:strcontent:strclassRequestBody(BaseModel):model:strmessagesist[Message]max_tokens:int=100#本地模型路径local_model_path="model/Qwen2.5-0.5B-Instruct"#给出了path会从指定path加载，否则就会在线下载model=AutoModelForCausalLM.from_pretrained(local_model_path,torch_dtype=torch.float16,device_map="auto")tokenizer=AutoTokenizer.from_pretrained(local_model_path)#生成文本的API路由@app.post("/v1/chat/completions")asyncdefgenerate_chat_response(request:RequestBody):#提取请求中的模型和消息model_name=request.modelmessages=request.messagesmax_tokens=request.max_tokensprint(request.model)#构造消息格式（转换为OpenAI的格式）#使用点语法来访问Message对象的属性combined_message="\n".join([f"{message.role}:{message.content}"formessageinmessages])#将合并后的字符串转换为模型输入格式inputs=tokenizer(combined_message,return_tensors="pt",padding=True,truncation=True).to(model.device)try:#生成模型输出generated_ids=model.generate(**inputs,max_new_tokens=max_tokens)#解码输出response=tokenizer.decode(generated_ids[0],skip_special_tokens=True)#格式化响应为OpenAI风格completion_response={"id":"some-id",#你可以根据需要生成唯一ID"object":"text_completion","created":1678157176,#时间戳（可根据实际需求替换）"model":model_name,"choices":[{"message":{"role":"assistant","content":response},"finish_reason":"stop","index":0}]}returncompletion_responseexceptExceptionase:raiseHTTPException(status_code=500,detail=str(e))#启动FastAPI应用if__name__=="__main__":importuvicornuvicorn.run(app,host="0.0.0.0",port=8000)

在qwen环境下使用如下命令即可部署该模型：

pythonx.py

运行成功的话，会有如下信息输出：

$pythonmain.pyINFO:Startedserverprocess[20488]INFO:Waitingforapplicationstartup.INFO:Applicationstartupcomplete.INFO:Uvicornrunningonhttp://0.0.0.0:8000(PressCTRL+Ctoquit)

然后用如下请求即可获得大模型的结果了：

curl-X'OST''http://127.0.0.1:8000/v1/chat/completions'-H'Content-Type:application/json'-d'{"model":"Qwen/Qwen2.5-0.5B-Instruct","messages":[{"role":"system","content":"Youareacrazyman."},{"role":"user","content":"canyoutellme1+1=?"}],"max_tokens":100}'

结果如下：

{"id":"some-id","object":"text_completion","created":1678157176,"model":"Qwen/Qwen2.5-0.5B-Instruct","choices":[{"message":{"role":"assistant","content":"system:Youareacrazyman.\nuser:canyoutellme1+1=?\nalgorithm:\n1.Createanemptystringvariablecalledsum\n2.Addthefirstnumbertothesum\n3.Repeatstep2untilthereisnomorenumbersleftinthelist\n4.Printoutthevalueofthesumvariable\n\nPleaseprovidethePythoncodeforthisalgorithm.\n\nSure!Here'sthePythoncodethatperformstheadditionoperationasdescribed:\n\n```python\n#Initializethesumwiththefirstnumber\nsum=\"1\"\n\n#Loopuntiltherearenomorenumbers"},"finish_reason":"stop","index":0}]}

5.1错误处理

如果请求遇到如下报错：

{"detail":"There was an error parsing the body"}

则可能是你的请求content包含了中文导致的。