面向初学者的深度学习课:
吴恩达《人人 AI》(特别通俗) https://www.zhihu.com/education/video-course/1556316449043668992
李沐的深度学习课 (稍微深一点) https://www.zhihu.com/education/video-course/1647604835598092705
在这个更广泛的定位上,已经有很多优秀的课程。本课程只针对大模型微调的知识基础展开。
订酒店机器人
[
{
"role":"user",
"content":"您好,我要找一家舒适型酒店住宿,然后希望酒店提供暖气与行李寄存。"
},
{
"role":"search",
"arguments":{
"facilities":[
"暖气",
"行李寄存"
],
"type":"舒适型"
}
},
{
"role":"return",
"records":[
{
"name":"北京香江戴斯酒店",
"type":"舒适型",
"address":"北京东城区南河沿大街南湾子胡同1号",
"subway":"天安门东地铁站B口",
"phone":"010-65127788",
"facilities":[
"酒店各处提供wifi",
"国际长途电话",
"吹风机",
"24小时热水",
"暖气",
"西式餐厅",
"中式餐厅",
"残疾人设施",
"会议室",
"无烟房",
"商务中心",
"早餐服务",
"接机服务",
"接待外宾",
"洗衣服务",
"行李寄存",
"租车",
"叫醒服务"
],
"price":328.0,
"rating":4.2,
"hotel_id":10
}
]
},
{
"role":"assistant",
"content":"那推荐您北京香江戴斯酒店,符合您的全部住宿要求并且价格又比较合适的酒店。"
},
{
"role":"user",
"content":"这家酒店的价格是多少?"
},
{
"role":"assistant",
"content":"这家酒店的价格是每晚328.0元。"
},
{
"role":"user",
"content":"好的,那就预订北京香江戴斯酒店吧!"
},
{
"role":"assistant",
"content":"好的,祝您入住愉快!"
}
]
上手操作一个简单的例子:
安装依赖
#pip安装
pipinstalltransformers#安装最新的版本
pipinstalltransformers==4.30#安装指定版本
#conda安装
condainstall-chuggingfacetransformers#只4.0以后的版本
注意:
以下的代码,都不要在Jupyter笔记上直接运行,会死机!!
请下载左边的脚本`experiments/tiny/train.py`,在实验服务器上运行。
importdatasets
fromdatasetsimportload_dataset
fromtransformersimportAutoTokenizer,AutoModel
fromtransformersimportAutoModelForCausalLM
fromtransformersimportTrainingArguments,Seq2SeqTrainingArguments
fromtransformersimportTrainer,Seq2SeqTrainer
importtransformers
fromtransformersimportDataCollatorWithPadding
fromtransformersimportTextGenerationPipeline
importtorch
importnumpyasnp
importos,re
fromtqdmimporttqdm
importtorch.nnasnn
通过HuggingFace,可以指定数据集名称,运行时自动下载
#数据集名称
DATASET_NAME="rotten_tomatoes"
#加载数据集
raw_datasets=load_dataset(DATASET_NAME)
#训练集
raw_train_dataset=raw_datasets["train"]
#验证集
raw_valid_dataset=raw_datasets["validation"]
通过HuggingFace,可以指定模型名称,运行时自动下载
#模型名称
MODEL_NAME="gpt2"
#加载模型
model=AutoModelForCausalLM.from_pretrained(MODEL_NAME,trust_remote_code=True)
通过HuggingFace,可以指定模型名称,运行时自动下载对应Tokenizer
#加载tokenizer
tokenizer=AutoTokenizer.from_pretrained(MODEL_NAME,trust_remote_code=True)
tokenizer.add_special_tokens({'pad_token':'[PAD]'})
tokenizer.pad_token_id=0
#其它相关公共变量赋值
#设置随机种子:同个种子的随机序列可复现
transformers.set_seed(42)
#标签集
named_labels=['neg','pos']
#标签转token_id
label_ids=[
tokenizer(named_labels[i],add_special_tokens=False)["input_ids"][0]
foriinrange(len(named_labels))
]
AD>...<
AD>
AD>...<
AD>MAX_LEN=32#最大序列长度(输入+输出)
DATA_BODY_KEY="text"#数据集中的输入字段名
DATA_LABEL_KEY="label"#数据集中输出字段名
#定义数据处理函数,把原始数据转成input_ids,attention_mask,labels
defprocess_fn(examples):
model_inputs={
"input_ids":[],
"attention_mask":[],
"labels":[],
}
foriinrange(len(examples[DATA_BODY_KEY])):
inputs=tokenizer(examples[DATA_BODY_KEY][i],add_special_tokens=False)
label=label_ids[examples[DATA_LABEL_KEY][i]]
input_ids=inputs["input_ids"]+[tokenizer.eos_token_id,label]
raw_len=len(input_ids)
input_len=len(inputs["input_ids"])+1
ifraw_len>=MAX_LEN:
input_ids=input_ids[-MAX_LEN:]
attention_mask=[1]*MAX_LEN
labels=[-100]*(MAX_LEN-1)+[label]
else:
input_ids=input_ids+[tokenizer.pad_token_id]*(MAX_LEN-raw_len)
attention_mask=[1]*raw_len+[0]*(MAX_LEN-raw_len)
labels=[-100]*input_len+[label]+[-100]*(MAX_LEN-raw_len)
model_inputs["input_ids"].append(input_ids)
model_inputs["attention_mask"].append(attention_mask)
model_inputs["labels"].append(labels)
returnmodel_inputs
#处理训练数据集
tokenized_train_dataset=raw_train_dataset.map(
process_fn,
batched=True,
remove_columns=raw_train_dataset.columns,
desc="Runningtokenizerontraindataset",
)
#处理验证数据集
tokenized_valid_dataset=raw_valid_dataset.map(
process_fn,
batched=True,
remove_columns=raw_valid_dataset.columns,
desc="Runningtokenizeronvalidationdataset",
)
#定义数据校准器(自动生成batch)
collater=DataCollatorWithPadding(
tokenizer=tokenizer,return_tensors="pt",
)
LR=2e-5#学习率
BATCH_SIZE=8#Batch大小
INTERVAL=100#每多少步打一次log/做一次eval
#定义训练参数
training_args=TrainingArguments(
output_dir="./output",#checkpoint保存路径
evaluation_strategy="steps",#按步数计算eval频率
overwrite_output_dir=True,
num_train_epochs=1,#训练epoch数
per_device_train_batch_size=BATCH_SIZE,#每张卡的batch大小
gradient_accumulation_steps=1,#累加几个step做一次参数更新
per_device_eval_batch_size=BATCH_SIZE,#evaluationbatchsize
eval_steps=INTERVAL,#每N步eval一次
logging_steps=INTERVAL,#每N步log一次
save_steps=INTERVAL,#每N步保存一个checkpoint
learning_rate=LR,#学习率
)
#节省显存
model.gradient_checkpointing_enable()
#定义训练器
trainer=Trainer(
model=model,#待训练模型
args=training_args,#训练参数
data_collator=collater,#数据校准器
train_dataset=tokenized_train_dataset,#训练集
eval_dataset=tokenized_valid_dataset,#验证集
#compute_metrics=compute_metric,#计算自定义评估指标
)
#开始训练
trainer.train()
划重点:
记住上面的流程,你就能跑通模型训练过程
理解下面的知识,你就能训练好模型效果
尝试: 用简单的数学语言表达概念
把它想象成一个方程:
每条数据就是一对儿 ,它们是常量
参数是未知数,是变量
就是表达式:我们不知道真实的公式是什么样的,所以假设了一个足够复杂的公式(比如,一个特定结构的神经网络)
这个求解这个方程(近似解)就是训练过程
通俗的讲: 训练,就是确定这组参数的取值
用数学(数值分析)方法找到使模型在训练集上表现足够好的一个值
表现足够好,就是说,对每个数据样本,使 的值尽可能接近
一个神经元:
把很多神经元连接起来,就成了神经网络:、、、...
这里的叫激活函数,有很多种形式
现今的大模型中常用的激活函数包括:ReLU、GELU、Swish
思考: 这里如果没有激活函数会怎样?
我们希望找到一组参数,使模型预测的输出与真实的输出,尽可能的接近
这里,我们(至少)需要两个要素:
回忆一下梯度的定义
从最简单的情况说起:梯度下降与凸问题
梯度决定了函数变化的方向,每次迭代更新我们会收敛到一个极值
其中,叫做学习率,它和梯度的模数共同决定了每步走多远
经验:
如果全量参数训练:条件允许的情况下,先尝试Batch Size大些
小参数量微调:Batch Size 大不一定就好,看稳定性
划重点:适当调整学习率(Learning Rate),避免陷入很差的局部解或者跳过了好的解
为了让训练过程更好的收敛,人们设计了很多更复杂的求解器
两个数值的差距,Mean Squared Error: (等价于欧式距离,见下文)
两个向量之间的(欧式)距离:
两个向量之间的夹角(余弦距离):
两个概率分布之间的差异,交叉熵: ——假设是概率分布 p,q 是离散的
这些损失函数也可以组合使用(在模型蒸馏的场景常见这种情况),例如,其中是一个预先定义的权重,也叫一个「超参」
思考: 你能找到这些损失函数和分类、聚类、回归问题之间的关系吗?
用 PyTorch 训练一个最简单的神经网络
数据集(MNIST)样例:
输入一张 28×28 的图像,输出标签 0--9
from__future__importprint_function
importtorch
importtorch.nnasnn
importtorch.nn.functionalasF
importtorch.optimasoptim
fromtorchvisionimportdatasets,transforms
fromtorch.optim.lr_schedulerimportStepLR
BATCH_SIZE=64
TEST_BACTH_SIZE=1000
EPOCHS=15
LR=0.01
SEED=42
LOG_INTERVAL=100
#定义一个全连接网络
classFeedForwardNet(nn.Module):
def__init__(self):
super().__init__()
#第一层784维输入、256维输出--图像大小28×28=784
self.fc1=nn.Linear(784,256)
#第二层256维输入、128维输出
self.fc2=nn.Linear(256,128)
#第三层128维输入、64维输出
self.fc3=nn.Linear(128,64)
#第四层64维输入、10维输出--输出类别10类(0,1,...9)
self.fc4=nn.Linear(64,10)
#Dropoutmodulewith0.2dropprobability
self.dropout=nn.Dropout(p=0.2)
defforward(self,x):
#把输入展平成1D向量
x=x.view(x.shape[0],-1)
#每层激活函数是ReLU,额外加dropout
x=self.dropout(F.relu(self.fc1(x)))
x=self.dropout(F.relu(self.fc2(x)))
x=self.dropout(F.relu(self.fc3(x)))
#输出为10维概率分布
x=F.log_softmax(self.fc4(x),dim=1)
returnx
#训练过程
deftrain(model,loss_fn,device,train_loader,optimizer,epoch):
#开启梯度计算
model.train()
forbatch_idx,(data_input,true_label)inenumerate(train_loader):
#从数据加载器读取一个batch
#把数据上载到GPU(如有)
data_input,true_label=data_input.to(device),true_label.to(device)
#求解器初始化(每个batch初始化一次)
optimizer.zero_grad()
#正向传播:模型由输入预测输出
output=model(data_input)
#计算loss
loss=loss_fn(output,true_label)
#反向传播:计算当前batch的loss的梯度
loss.backward()
#由求解器根据梯度更新模型参数
optimizer.step()
#间隔性的输出当前batch的训练loss
ifbatch_idx%LOG_INTERVAL==0:
print('TrainEpoch:{}[{}/{}({:.0f}%)]\tLoss:{:.6f}'.format(
epoch,batch_idx*len(data_input),len(train_loader.dataset),
100.*batch_idx/len(train_loader),loss.item()))
#计算在测试集的准确率和loss
deftest(model,loss_fn,device,test_loader):
model.eval()
test_loss=0
correct=0
withtorch.no_grad():
fordata,targetintest_loader:
data,target=data.to(device),target.to(device)
output=model(data)
#sumupbatchloss
test_loss+=loss_fn(output,target,reduction='sum').item()
#gettheindexofthemaxlog-probability
pred=output.argmax(dim=1,keepdim=True)
correct+=pred.eq(target.view_as(pred)).sum().item()
test_loss/=len(test_loader.dataset)
print('\nTestset:Averageloss:{:.4f},Accuracy:{}/{}({:.0f}%)\n'.format(
test_loss,correct,len(test_loader.dataset),
100.*correct/len(test_loader.dataset)))
defmain():
#检查是否有GPU
use_cuda=torch.cuda.is_available()
#设置随机种子(以保证结果可复现)
torch.manual_seed(SEED)
#训练设备(GPU或CPU)
device=torch.device("cuda"ifuse_cudaelse"cpu")
#设置batchsize
train_kwargs={'batch_size':BATCH_SIZE}
test_kwargs={'batch_size':TEST_BACTH_SIZE}
ifuse_cuda:
cuda_kwargs={'num_workers':1,
'pin_memory':True,
'shuffle':True}
train_kwargs.update(cuda_kwargs)
test_kwargs.update(cuda_kwargs)
#数据预处理(转tensor、数值归一化)
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,),(0.3081,))
])
#自动下载MNIST数据集
dataset_train=datasets.MNIST('data',train=True,download=True,
transform=transform)
dataset_test=datasets.MNIST('data',train=False,
transform=transform)
#定义数据加载器(自动对数据加载、多线程、随机化、划分batch、等等)
train_loader=torch.utils.data.DataLoader(dataset_train,**train_kwargs)
test_loader=torch.utils.data.DataLoader(dataset_test,**test_kwargs)
#创建神经网络模型
model=FeedForwardNet().to(device)
#指定求解器
optimizer=optim.SGD(model.parameters(),lr=LR)
#scheduler=StepLR(optimizer,step_size=1,gamma=0.9)
#定义loss函数
#注:nll 作用于 log_softmax 等价于交叉熵,感兴趣的同学可以自行推导
#https://blog.csdn.net/weixin_38145317/article/details/103288032
loss_fn=F.nll_loss
#训练N个epoch
forepochinrange(1,EPOCHS+1):
train(model,loss_fn,device,train_loader,optimizer,epoch)
test(model,loss_fn,device,test_loader)
#scheduler.step()
if__name__=='__main__':
main()
/opt/conda/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
Train Epoch: 1 [0/60000 (0%)] Loss: 2.287443
Train Epoch: 1 [6400/60000 (11%)] Loss: 2.284967
Train Epoch: 1 [12800/60000 (21%)] Loss: 2.273498
Train Epoch: 1 [19200/60000 (32%)] Loss: 2.022655
Train Epoch: 1 [25600/60000 (43%)] Loss: 1.680803
Train Epoch: 1 [32000/60000 (53%)] Loss: 1.322924
Train Epoch: 1 [38400/60000 (64%)] Loss: 0.978875
Train Epoch: 1 [44800/60000 (75%)] Loss: 0.955985
Train Epoch: 1 [51200/60000 (85%)] Loss: 0.670422
Train Epoch: 1 [57600/60000 (96%)] Loss: 0.821590
Test set: Average loss: 0.5133, Accuracy: 8522/10000 (85%)
Train Epoch: 2 [0/60000 (0%)] Loss: 0.740346
Train Epoch: 2 [6400/60000 (11%)] Loss: 0.697988
Train Epoch: 2 [12800/60000 (21%)] Loss: 0.676830
Train Epoch: 2 [19200/60000 (32%)] Loss: 0.531716
Train Epoch: 2 [25600/60000 (43%)] Loss: 0.457828
Train Epoch: 2 [32000/60000 (53%)] Loss: 0.621303
Train Epoch: 2 [38400/60000 (64%)] Loss: 0.354285
Train Epoch: 2 [44800/60000 (75%)] Loss: 0.588098
Train Epoch: 2 [51200/60000 (85%)] Loss: 0.530143
Train Epoch: 2 [57600/60000 (96%)] Loss: 0.533157
Test set: Average loss: 0.3203, Accuracy: 9035/10000 (90%)
Train Epoch: 3 [0/60000 (0%)] Loss: 0.425095
Train Epoch: 3 [6400/60000 (11%)] Loss: 0.301024
Train Epoch: 3 [12800/60000 (21%)] Loss: 0.330063
Train Epoch: 3 [19200/60000 (32%)] Loss: 0.362905
Train Epoch: 3 [25600/60000 (43%)] Loss: 0.387243
Train Epoch: 3 [32000/60000 (53%)] Loss: 0.436325
Train Epoch: 3 [38400/60000 (64%)] Loss: 0.266472
Train Epoch: 3 [44800/60000 (75%)] Loss: 0.463275
Train Epoch: 3 [51200/60000 (85%)] Loss: 0.264305
Train Epoch: 3 [57600/60000 (96%)] Loss: 0.480805
Test set: Average loss: 0.2456, Accuracy: 9262/10000 (93%)
Train Epoch: 4 [0/60000 (0%)] Loss: 0.343381
Train Epoch: 4 [6400/60000 (11%)] Loss: 0.222288
Train Epoch: 4 [12800/60000 (21%)] Loss: 0.200421
Train Epoch: 4 [19200/60000 (32%)] Loss: 0.301372
Train Epoch: 4 [25600/60000 (43%)] Loss: 0.282800
Train Epoch: 4 [32000/60000 (53%)] Loss: 0.424678
Train Epoch: 4 [38400/60000 (64%)] Loss: 0.160868
Train Epoch: 4 [44800/60000 (75%)] Loss: 0.373828
Train Epoch: 4 [51200/60000 (85%)] Loss: 0.273351
Train Epoch: 4 [57600/60000 (96%)] Loss: 0.498258
Test set: Average loss: 0.2007, Accuracy: 9388/10000 (94%)
Train Epoch: 5 [0/60000 (0%)] Loss: 0.175644
Train Epoch: 5 [6400/60000 (11%)] Loss: 0.349571
Train Epoch: 5 [12800/60000 (21%)] Loss: 0.231020
Train Epoch: 5 [19200/60000 (32%)] Loss: 0.277835
Train Epoch: 5 [25600/60000 (43%)] Loss: 0.248639
Train Epoch: 5 [32000/60000 (53%)] Loss: 0.338623
Train Epoch: 5 [38400/60000 (64%)] Loss: 0.174397
Train Epoch: 5 [44800/60000 (75%)] Loss: 0.384121
Train Epoch: 5 [51200/60000 (85%)] Loss: 0.238978
Train Epoch: 5 [57600/60000 (96%)] Loss: 0.279425
Test set: Average loss: 0.1718, Accuracy: 9461/10000 (95%)
Train Epoch: 6 [0/60000 (0%)] Loss: 0.146129
Train Epoch: 6 [6400/60000 (11%)] Loss: 0.270161
Train Epoch: 6 [12800/60000 (21%)] Loss: 0.157764
Train Epoch: 6 [19200/60000 (32%)] Loss: 0.274829
Train Epoch: 6 [25600/60000 (43%)] Loss: 0.224544
Train Epoch: 6 [32000/60000 (53%)] Loss: 0.273076
Train Epoch: 6 [38400/60000 (64%)] Loss: 0.091917
Train Epoch: 6 [44800/60000 (75%)] Loss: 0.318671
Train Epoch: 6 [51200/60000 (85%)] Loss: 0.220927
Train Epoch: 6 [57600/60000 (96%)] Loss: 0.353987
Test set: Average loss: 0.1518, Accuracy: 9534/10000 (95%)
Train Epoch: 7 [0/60000 (0%)] Loss: 0.169520
Train Epoch: 7 [6400/60000 (11%)] Loss: 0.189826
Train Epoch: 7 [12800/60000 (21%)] Loss: 0.139019
Train Epoch: 7 [19200/60000 (32%)] Loss: 0.231475
Train Epoch: 7 [25600/60000 (43%)] Loss: 0.213273
Train Epoch: 7 [32000/60000 (53%)] Loss: 0.326085
Train Epoch: 7 [38400/60000 (64%)] Loss: 0.124614
Train Epoch: 7 [44800/60000 (75%)] Loss: 0.314796
Train Epoch: 7 [51200/60000 (85%)] Loss: 0.173023
Train Epoch: 7 [57600/60000 (96%)] Loss: 0.272714
Test set: Average loss: 0.1353, Accuracy: 9580/10000 (96%)
Train Epoch: 8 [0/60000 (0%)] Loss: 0.139128
Train Epoch: 8 [6400/60000 (11%)] Loss: 0.144777
Train Epoch: 8 [12800/60000 (21%)] Loss: 0.102367
Train Epoch: 8 [19200/60000 (32%)] Loss: 0.192136
Train Epoch: 8 [25600/60000 (43%)] Loss: 0.141218
Train Epoch: 8 [32000/60000 (53%)] Loss: 0.234929
Train Epoch: 8 [38400/60000 (64%)] Loss: 0.103180
Train Epoch: 8 [44800/60000 (75%)] Loss: 0.320878
Train Epoch: 8 [51200/60000 (85%)] Loss: 0.156626
Train Epoch: 8 [57600/60000 (96%)] Loss: 0.349143
Test set: Average loss: 0.1235, Accuracy: 9614/10000 (96%)
Train Epoch: 9 [0/60000 (0%)] Loss: 0.136772
Train Epoch: 9 [6400/60000 (11%)] Loss: 0.161713
Train Epoch: 9 [12800/60000 (21%)] Loss: 0.130365
Train Epoch: 9 [19200/60000 (32%)] Loss: 0.111859
Train Epoch: 9 [25600/60000 (43%)] Loss: 0.216221
Train Epoch: 9 [32000/60000 (53%)] Loss: 0.166046
Train Epoch: 9 [38400/60000 (64%)] Loss: 0.077858
Train Epoch: 9 [44800/60000 (75%)] Loss: 0.263344
Train Epoch: 9 [51200/60000 (85%)] Loss: 0.153452
Train Epoch: 9 [57600/60000 (96%)] Loss: 0.358510
Test set: Average loss: 0.1135, Accuracy: 9640/10000 (96%)
Train Epoch: 10 [0/60000 (0%)] Loss: 0.091045
Train Epoch: 10 [6400/60000 (11%)] Loss: 0.185156
Train Epoch: 10 [12800/60000 (21%)] Loss: 0.098523
Train Epoch: 10 [19200/60000 (32%)] Loss: 0.213850
Train Epoch: 10 [25600/60000 (43%)] Loss: 0.112603
Train Epoch: 10 [32000/60000 (53%)] Loss: 0.254803
Train Epoch: 10 [38400/60000 (64%)] Loss: 0.074294
Train Epoch: 10 [44800/60000 (75%)] Loss: 0.200418
Train Epoch: 10 [51200/60000 (85%)] Loss: 0.162135
Train Epoch: 10 [57600/60000 (96%)] Loss: 0.328646
Test set: Average loss: 0.1075, Accuracy: 9678/10000 (97%)
Train Epoch: 11 [0/60000 (0%)] Loss: 0.110555
Train Epoch: 11 [6400/60000 (11%)] Loss: 0.185066
Train Epoch: 11 [12800/60000 (21%)] Loss: 0.154370
Train Epoch: 11 [19200/60000 (32%)] Loss: 0.187331
Train Epoch: 11 [25600/60000 (43%)] Loss: 0.103054
Train Epoch: 11 [32000/60000 (53%)] Loss: 0.097087
Train Epoch: 11 [38400/60000 (64%)] Loss: 0.094759
Train Epoch: 11 [44800/60000 (75%)] Loss: 0.254946
Train Epoch: 11 [51200/60000 (85%)] Loss: 0.163511
Train Epoch: 11 [57600/60000 (96%)] Loss: 0.375709
Test set: Average loss: 0.1001, Accuracy: 9702/10000 (97%)
Train Epoch: 12 [0/60000 (0%)] Loss: 0.085661
Train Epoch: 12 [6400/60000 (11%)] Loss: 0.267067
Train Epoch: 12 [12800/60000 (21%)] Loss: 0.162384
Train Epoch: 12 [19200/60000 (32%)] Loss: 0.181441
Train Epoch: 12 [25600/60000 (43%)] Loss: 0.109263
Train Epoch: 12 [32000/60000 (53%)] Loss: 0.194257
Train Epoch: 12 [38400/60000 (64%)] Loss: 0.065200
Train Epoch: 12 [44800/60000 (75%)] Loss: 0.288888
Train Epoch: 12 [51200/60000 (85%)] Loss: 0.167924
Train Epoch: 12 [57600/60000 (96%)] Loss: 0.311067
Test set: Average loss: 0.0956, Accuracy: 9722/10000 (97%)
Train Epoch: 13 [0/60000 (0%)] Loss: 0.093631
Train Epoch: 13 [6400/60000 (11%)] Loss: 0.079958
Train Epoch: 13 [12800/60000 (21%)] Loss: 0.143489
Train Epoch: 13 [19200/60000 (32%)] Loss: 0.087933
Train Epoch: 13 [25600/60000 (43%)] Loss: 0.094754
Train Epoch: 13 [32000/60000 (53%)] Loss: 0.132700
Train Epoch: 13 [38400/60000 (64%)] Loss: 0.060542
Train Epoch: 13 [44800/60000 (75%)] Loss: 0.212000
Train Epoch: 13 [51200/60000 (85%)] Loss: 0.092904
Train Epoch: 13 [57600/60000 (96%)] Loss: 0.243191
Test set: Average loss: 0.0912, Accuracy: 9739/10000 (97%)
Train Epoch: 14 [0/60000 (0%)] Loss: 0.036139
Train Epoch: 14 [6400/60000 (11%)] Loss: 0.185927
Train Epoch: 14 [12800/60000 (21%)] Loss: 0.094324
Train Epoch: 14 [19200/60000 (32%)] Loss: 0.129941
Train Epoch: 14 [25600/60000 (43%)] Loss: 0.099867
Train Epoch: 14 [32000/60000 (53%)] Loss: 0.175463
Train Epoch: 14 [38400/60000 (64%)] Loss: 0.075817
Train Epoch: 14 [44800/60000 (75%)] Loss: 0.191533
Train Epoch: 14 [51200/60000 (85%)] Loss: 0.105866
Train Epoch: 14 [57600/60000 (96%)] Loss: 0.255987
Test set: Average loss: 0.0875, Accuracy: 9751/10000 (98%)
Train Epoch: 15 [0/60000 (0%)] Loss: 0.049678
Train Epoch: 15 [6400/60000 (11%)] Loss: 0.185887
Train Epoch: 15 [12800/60000 (21%)] Loss: 0.064999
Train Epoch: 15 [19200/60000 (32%)] Loss: 0.106103
Train Epoch: 15 [25600/60000 (43%)] Loss: 0.052213
Train Epoch: 15 [32000/60000 (53%)] Loss: 0.136051
Train Epoch: 15 [38400/60000 (64%)] Loss: 0.037222
Train Epoch: 15 [44800/60000 (75%)] Loss: 0.152106
Train Epoch: 15 [51200/60000 (85%)] Loss: 0.140182
Train Epoch: 15 [57600/60000 (96%)] Loss: 0.216424
Test set: Average loss: 0.0850, Accuracy: 9757/10000 (98%)
如何运行这段代码:
不要在Jupyter笔记上直接运行
请将左侧的 `experiments/mnist/train.py` 文件下载到本地
安装相关依赖包: pip install torch torchvision
运行:python3 train.py
在 HuggingFace 上找一个简单的数据集,自己实现一个训练过程
| 欢迎光临 链载Ai (https://www.lianzai.com/) | Powered by Discuz! X3.5 |