一、BERT(MLM + NSP)
fromtransformersimportBertTokenizer,BertForMaskedLM,BertForNextSentencePrediction,Trainer,TrainingArgumentsimporttorch#加载预训练模型和tokenizertokenizer=BertTokenizer.from_pretrained("bert-base-uncased")model_mlm=BertForMaskedLM.from_pretrained("bert-base-uncased")#MLM专用model_nsp=BertForNextSentencePrediction.from_pretrained("bert-base-uncased")#NSP专用(旧版BERT支持)#示例输入(MLM)text="Thecatsitsonthe[MASK]."inputs=tokenizer(text,return_tensors="pt")outputs=model_mlm(**inputs)predicted_token_id=torch.argmax(outputs.logits[0,-1]).item()print(tokenizer.decode(predicted_token_id))#输出预测的词(如"mat")#示例输入(NSP)sentence1="Ilikecats."sentence2="Theyarecute."sentence3="Theskyisblue."inputs_nsp=tokenizer(sentence1+"[SEP]"+sentence2,return_tensors="pt")#正例inputs_nsp_neg=tokenizer(sentence1+"[SEP]"+sentence3,return_tensors="pt")#负例model_nsp=BertForNextSentencePrediction.from_pretrained("bert-base-uncased")#注意:新版本BERT已合并MLM+NSP二、GPT(CLM)
fromtransformersimportGPT2LMHeadModel,GPT2Tokenizerimporttorch#加载预训练模型和tokenizertokenizer=GPT2Tokenizer.from_pretrained("gpt2")model=GPT2LMHeadModel.from_pretrained("gpt2")#输入文本(CLM任务)input_text="Thecatsitsonthe"inputs=tokenizer(input_text,return_tensors="pt")#生成下一个词outputs=model.generate(**inputs,max_length=20,num_return_sequences=1)print(tokenizer.decode(outputs[0]))#输出完整句子(如"Thecatsitsonthematandsleeps.")| 欢迎光临 链载Ai (https://www.lianzai.com/) | Powered by Discuz! X3.5 |