为了帮助大家更好地理解和应用书生系列大模型,“玩转书生大模型”将推出一系列围绕书生大模型微调、部署、评测和应用的文章。欢迎大家订阅并积极投稿,一起分享经验与成果,推动大模型技术的普及与进步。
本文来自社区用户投稿,作者:「国产化硬件」微调部署兴趣小组 丁一超,将带领大家基于 ModelArts,使用 XTuner 在昇腾 910B 上单卡微调一个 InternLM 个人小助手。
InternLM 开源链接:(文末点击阅读原文可直达)
https://github.com/InternLM/InternLM
XTuner 开源链接:
https://github.com/InternLM/xtuner
xtunerhelp
xtunerversion
xtunerlist-cfg
xtunerlist-cfg-p$NAME
xtunercopy-cfg$CONFIG$SAVE_PATH
xtunertrain$CONFIG
xtunerconvertpth_to_hf$CONFIG$PATH_TO_PTH_MODEL$SAVE_PATH_TO_HF_MODEL
pipinstalleinopspipinstallacceleratepipinstalldlinfer-ascendpipinstalldeepspeedpipinstallloguru
gitclone-bv0.1.23https://github.com/InternLM/xtunergitclone-bv0.1.23https://gitee.com/InternLM/xtuner#github不行的话用这条cdxtuner
parser.add_argument('--device',default='npu',choices=('cuda','cpu','auto','npu'),help='Indicatethedevice')#choices里面添加一个'npu',也可以把default直接改成npupipinstall-e.
mkdir-p/home/ma-user/work/work_dir/cd/home/ma-user/work/work_dir/
exportHF_ENDPOINT=https://hf-mirror.com
huggingface-clidownload--resume-downloadinternlm/internlm2-chat-1_8b--local-dir/home/ma-user/work/model/internlm2-chat-1_8b
import torchfrom transformers import AutoTokenizer, AutoModelForCausalLMmodel_name_or_path = "/home/ma-user/work/model/internlm2-chat-1_8b"# 模型所在的本地路径tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, trust_remote_code=True, device_map='npu')model = AutoModelForCausalLM.from_pretrained(model_name_or_path, trust_remote_code=True, torch_dtype=torch.bfloat16, device_map='npu')model = model.eval()system_prompt = """You are an AI assistant whose name is InternLM (书生·浦语).- InternLM (书生·浦语) is a conversational language model that is developed by Shanghai AI Laboratory (上海人工智能实验室). It is designed to be helpful, honest, and harmless.- InternLM (书生·浦语) can understand and communicate fluently in the language chosen by the user such as English and 中文."""messages = [(system_prompt, '')]print("=============Welcome to InternLM chatbot, type 'exit' to exit.=============")while True:input_text = input("\nUser>>> ")input_text = input_text.replace(' ', '')if input_text == "exit":breaklength = 0for response, _ in model.stream_chat(tokenizer, input_text, messages):if response is not None:print(response[length:], flush=True, end="")length = len(response)
pythoncli_demo.py
cd/home/ma-user/work/work_dir/mkdir-pdatastouchdatas/assistant.json
vimxtuner_generate_assistant.py
import json# 设置用户的名字name = 'JeffDing同志'# 设置需要重复添加的数据次数n = 8000# 初始化数据data = [{"conversation": [{"input": "请介绍一下你自己", "output": "我是{}的小助手,内在是上海AI实验室书生·浦语的1.8B大模型哦".format(name)}]},{"conversation": [{"input": "你在实战营做什么", "output": "我在这里帮助{}完成XTuner微调个人小助手的任务".format(name)}]}]# 通过循环,将初始化的对话数据重复添加到data列表中for i in range(n):data.append(data[0])data.append(data[1])# 将data列表中的数据写入到'datas/assistant.json'文件中with open('datas/assistant.json', 'w', encoding='utf-8') as f:# 使用json.dump方法将数据以JSON格式写入文件# ensure_ascii=False 确保中文字符正常显示# indent=4 使得文件内容格式化,便于阅读json.dump(data, f, ensure_ascii=False, indent=4)
-name='JeffDing同志'+name="你自己的名称"
pythonxtuner_generate_assistant.py
xtunerlist-cfg-pinternlm2
xtunercopy-cfginternlm2_chat_1_8b_qlora_alpaca_e3.
########################################################################PART 1Settings ########################################################################- pretrained_model_name_or_path = 'internlm/internlm2-chat-1_8b'+ pretrained_model_name_or_path = '/home/ma-user/work/model/internlm2-chat-1_8b'- alpaca_en_path = 'tatsu-lab/alpaca'+ alpaca_en_path = 'datas/assistant.json'evaluation_inputs = [-'请给我介绍五个上海的景点', 'lease tell me five scenic spots in Shanghai'
+'请介绍一下你自己', 'lease introduce yourself']
########################################################################PART 3Dataset & Dataloader ########################################################################alpaca_en = dict(type=process_hf_dataset,- dataset=dict(type=load_dataset, path=alpaca_en_path),+ dataset=dict(type=load_dataset, path='json', data_files=dict(train=alpaca_en_path)),tokenizer=tokenizer,max_length=max_length,- dataset_map_fn=alpaca_map_fn,+ dataset_map_fn=None,template_map_fn=dict(type=template_map_fn_factory, template=prompt_template),remove_unused_columns=True,shuffle_before_pack=True,pack_to_max_length=pack_to_max_length,use_varlen_attn=use_varlen_attn)
########################################################################PART 2Model & Tokenizer########################################################################- quantization_config=dict(-type=BitsAndBytesConfig,-load_in_4bit=True,-load_in_8bit=False,-llm_int8_threshold=6.0,-llm_int8_has_fp16_weight=False,-bnb_4bit_compute_dtype=torch.float16,-bnb_4bit_use_double_quant=True,-bnb_4bit_quant_type='nf4')
xtunertrain./internlm2_chat_1_8b_qlora_alpaca_e3_copy.py微调启动时 NPU 的使用情况,使用 npu-smi info 命令查看
pth_file=`ls-t./work_dirs/internlm2_chat_1_8b_qlora_alpaca_e3_copy/*.pth|head-n1`xtunerconvertpth_to_hf./internlm2_chat_1_8b_qlora_alpaca_e3_copy.py${pth_file}./hf
xtunerconvertmerge/home/ma-user/work/model/internlm2-chat-1_8b./hf./merged--max-shard-size2GB--devicenpu
-model_name_or_path="/home/ma-user/work/model/internlm2-chat-1_8b"#模型所在的本地路径+model_name_or_path="/home/ma-user/work/work_dir/merged"#模型所在的本地路径
pythoncli_demo.ppy
| 欢迎光临 链载Ai (https://www.lianzai.com/) | Powered by Discuz! X3.5 |