链载Ai

标题: 私有部署的需求Fine-Tuning [打印本页]

作者: 链载Ai    时间: 昨天 11:17
标题: 私有部署的需求Fine-Tuning

ingFang SC", Cambria, Cochin, Georgia, Times, "Times New Roman", serif;letter-spacing: normal;text-wrap: wrap;background-color: rgb(255, 255, 255);border-bottom: 2px solid rgb(239, 112, 96);visibility: visible;">坚持写作第12天:开源模型原生的能力不满足业务需求,私有部署的需求Fine-Tuning

前言

  1. 了解机器学习的基本概念
  2. 掌握模型训练/微调/小参数量微调的操作过程
  3. 掌握模型微调/小参数量微调关键「超参」
  4. 掌握训练数据的选择、准备、清洗等方法与思路
  5. 训练一个垂直领域的大模型

面向初学者的深度学习课:

  1. 吴恩达《人人 AI》(特别通俗) https://www.zhihu.com/education/video-course/1556316449043668992

  2. 李沐的深度学习课 (稍微深一点) https://www.zhihu.com/education/video-course/1647604835598092705

在这个更广泛的定位上,已经有很多优秀的课程。本课程只针对大模型微调的知识基础展开。

什么时候需要 Fine-Tuning

  1. 有私有部署的需求
  2. 开源模型原生的能力不满足业务需求

先看一个例子

订酒店机器人

[
{
"role":"user",
"content":"您好,我要找一家舒适型酒店住宿,然后希望酒店提供暖气与行李寄存。"
},
{
"role":"search",
"arguments":{
"facilities":[
"暖气",
"行李寄存"
],
"type":"舒适型"
}
},
{
"role":"return",
"records":[
{
"name":"北京香江戴斯酒店",
"type":"舒适型",
"address":"北京东城区南河沿大街南湾子胡同1号",
"subway":"天安门东地铁站B口",
"phone":"010-65127788",
"facilities":[
"酒店各处提供wifi",
"国际长途电话",
"吹风机",
"24小时热水",
"暖气",
"西式餐厅",
"中式餐厅",
"残疾人设施",
"会议室",
"无烟房",
"商务中心",
"早餐服务",
"接机服务",
"接待外宾",
"洗衣服务",
"行李寄存",
"租车",
"叫醒服务"
],
"price":328.0,
"rating":4.2,
"hotel_id":10
}
]
},
{
"role":"assistant",
"content":"那推荐您北京香江戴斯酒店,符合您的全部住宿要求并且价格又比较合适的酒店。"
},
{
"role":"user",
"content":"这家酒店的价格是多少?"
},
{
"role":"assistant",
"content":"这家酒店的价格是每晚328.0元。"
},
{
"role":"user",
"content":"好的,那就预订北京香江戴斯酒店吧!"
},
{
"role":"assistant",
"content":"好的,祝您入住愉快!"
}
]

一、先找找感觉

上手操作一个简单的例子:

1.1、工具:介绍一个模型训练利器 Hugging Face

安装依赖

#pip安装
pipinstalltransformers#安装最新的版本
pipinstalltransformers==4.30#安装指定版本
#conda安装
condainstall-chuggingfacetransformers#只4.0以后的版本

1.2、操作流程

注意:

  1. 导入相关库
importdatasets
fromdatasetsimportload_dataset
fromtransformersimportAutoTokenizer,AutoModel
fromtransformersimportAutoModelForCausalLM
fromtransformersimportTrainingArguments,Seq2SeqTrainingArguments
fromtransformersimportTrainer,Seq2SeqTrainer
importtransformers
fromtransformersimportDataCollatorWithPadding
fromtransformersimportTextGenerationPipeline
importtorch
importnumpyasnp
importos,re
fromtqdmimporttqdm
importtorch.nnasnn
  1. 加载数据集

通过HuggingFace,可以指定数据集名称,运行时自动下载

#数据集名称
DATASET_NAME="rotten_tomatoes"

#加载数据集
raw_datasets=load_dataset(DATASET_NAME)

#训练集
raw_train_dataset=raw_datasets["train"]

#验证集
raw_valid_dataset=raw_datasets["validation"]
  1. 加载模型

通过HuggingFace,可以指定模型名称,运行时自动下载

#模型名称
MODEL_NAME="gpt2"

#加载模型
model=AutoModelForCausalLM.from_pretrained(MODEL_NAME,trust_remote_code=True)
  1. 加载 Tokenizer

通过HuggingFace,可以指定模型名称,运行时自动下载对应Tokenizer

#加载tokenizer
tokenizer=AutoTokenizer.from_pretrained(MODEL_NAME,trust_remote_code=True)
tokenizer.add_special_tokens({'pad_token':'[PAD]'})
tokenizer.pad_token_id=0
#其它相关公共变量赋值

#设置随机种子:同个种子的随机序列可复现
transformers.set_seed(42)

#标签集
named_labels=['neg','pos']

#标签转token_id
label_ids=[
tokenizer(named_labels[i],add_special_tokens=False)["input_ids"][0]
foriinrange(len(named_labels))
]
  1. 处理数据集:转成模型接受的输入格式
MAX_LEN=32#最大序列长度(输入+输出)
DATA_BODY_KEY="text"#数据集中的输入字段名
DATA_LABEL_KEY="label"#数据集中输出字段名

#定义数据处理函数,把原始数据转成input_ids,attention_mask,labels
defprocess_fn(examples):
model_inputs={
"input_ids":[],
"attention_mask":[],
"labels":[],
}
foriinrange(len(examples[DATA_BODY_KEY])):
inputs=tokenizer(examples[DATA_BODY_KEY][i],add_special_tokens=False)
label=label_ids[examples[DATA_LABEL_KEY][i]]
input_ids=inputs["input_ids"]+[tokenizer.eos_token_id,label]

raw_len=len(input_ids)
input_len=len(inputs["input_ids"])+1

ifraw_len>=MAX_LEN:
input_ids=input_ids[-MAX_LEN:]
attention_mask=[1]*MAX_LEN
labels=[-100]*(MAX_LEN-1)+[label]
else:
input_ids=input_ids+[tokenizer.pad_token_id]*(MAX_LEN-raw_len)
attention_mask=[1]*raw_len+[0]*(MAX_LEN-raw_len)
labels=[-100]*input_len+[label]+[-100]*(MAX_LEN-raw_len)
model_inputs["input_ids"].append(input_ids)
model_inputs["attention_mask"].append(attention_mask)
model_inputs["labels"].append(labels)
returnmodel_inputs
#处理训练数据集
tokenized_train_dataset=raw_train_dataset.map(
process_fn,
batched=True,
remove_columns=raw_train_dataset.columns,
desc="Runningtokenizerontraindataset",
)

#处理验证数据集
tokenized_valid_dataset=raw_valid_dataset.map(
process_fn,
batched=True,
remove_columns=raw_valid_dataset.columns,
desc="Runningtokenizeronvalidationdataset",
)
  1. 定义数据规整器:训练时自动将数据拆分成 Batch
#定义数据校准器(自动生成batch)
collater=DataCollatorWithPadding(
tokenizer=tokenizer,return_tensors="pt",
)
  1. 定义训练 超参:比如学习率
LR=2e-5#学习率
BATCH_SIZE=8#Batch大小
INTERVAL=100#每多少步打一次log/做一次eval

#定义训练参数
training_args=TrainingArguments(
output_dir="./output",#checkpoint保存路径
evaluation_strategy="steps",#按步数计算eval频率
overwrite_output_dir=True,
num_train_epochs=1,#训练epoch数
per_device_train_batch_size=BATCH_SIZE,#每张卡的batch大小
gradient_accumulation_steps=1,#累加几个step做一次参数更新
per_device_eval_batch_size=BATCH_SIZE,#evaluationbatchsize
eval_steps=INTERVAL,#每N步eval一次
logging_steps=INTERVAL,#每N步log一次
save_steps=INTERVAL,#每N步保存一个checkpoint
learning_rate=LR,#学习率
)
  1. 定义训练器
#节省显存
model.gradient_checkpointing_enable()

#定义训练器
trainer=Trainer(
model=model,#待训练模型
args=training_args,#训练参数
data_collator=collater,#数据校准器
train_dataset=tokenized_train_dataset,#训练集
eval_dataset=tokenized_valid_dataset,#验证集
#compute_metrics=compute_metric,#计算自定义评估指标
)
  1. 开始训练
#开始训练
trainer.train()

总结上述过程

  1. 加载数据集
  2. 数据预处理:

把很多神经元连接起来,就成了神经网络:、、、...

这里的叫激活函数,有很多种形式

现今的大模型中常用的激活函数包括:ReLU、GELU、Swish


思考: 这里如果没有激活函数会怎样?

三、什么是模型训练

我们希望找到一组参数,使模型预测的输出与真实的输出,尽可能的接近

这里,我们(至少)需要两个要素:

3.1、模型训练本质上是一个求解最优化问题的过程

3.2、怎么求解

回忆一下梯度的定义

从最简单的情况说起:梯度下降与凸问题

梯度决定了函数变化的方向,每次迭代更新我们会收敛到一个极值

其中,叫做学习率,它和梯度的模数共同决定了每步走多远

3.3、现实总是没那么简单(1):在整个数据集上求梯度,计算量太大了

经验:

3.4、现实总是没那么简单(2):深度学习没有全局最优解(非凸问题)


3.5、现实总是没那么简单(3):学习率也很关键,甚至需要动态调整


划重点:适当调整学习率(Learning Rate),避免陷入很差的局部解或者跳过了好的解

四、求解器

为了让训练过程更好的收敛,人们设计了很多更复杂的求解器

五、一些常用的损失函数

思考: 你能找到这些损失函数和分类、聚类、回归问题之间的关系吗?

六、再动手复习一下上述过程

用 PyTorch 训练一个最简单的神经网络

数据集(MNIST)样例:

输入一张 28×28 的图像,输出标签 0--9

from__future__importprint_function
importtorch
importtorch.nnasnn
importtorch.nn.functionalasF
importtorch.optimasoptim
fromtorchvisionimportdatasets,transforms
fromtorch.optim.lr_schedulerimportStepLR

BATCH_SIZE=64
TEST_BACTH_SIZE=1000
EPOCHS=15
LR=0.01
SEED=42
LOG_INTERVAL=100

#定义一个全连接网络
classFeedForwardNet(nn.Module):
def__init__(self):
super().__init__()
#第一层784维输入、256维输出--图像大小28×28=784
self.fc1=nn.Linear(784,256)
#第二层256维输入、128维输出
self.fc2=nn.Linear(256,128)
#第三层128维输入、64维输出
self.fc3=nn.Linear(128,64)
#第四层64维输入、10维输出--输出类别10类(0,1,...9)
self.fc4=nn.Linear(64,10)

#Dropoutmodulewith0.2dropprobability
self.dropout=nn.Dropout(p=0.2)

defforward(self,x):
#把输入展平成1D向量
x=x.view(x.shape[0],-1)

#每层激活函数是ReLU,额外加dropout
x=self.dropout(F.relu(self.fc1(x)))
x=self.dropout(F.relu(self.fc2(x)))
x=self.dropout(F.relu(self.fc3(x)))

#输出为10维概率分布
x=F.log_softmax(self.fc4(x),dim=1)

returnx

#训练过程
deftrain(model,loss_fn,device,train_loader,optimizer,epoch):
#开启梯度计算
model.train()
forbatch_idx,(data_input,true_label)inenumerate(train_loader):
#从数据加载器读取一个batch
#把数据上载到GPU(如有)
data_input,true_label=data_input.to(device),true_label.to(device)
#求解器初始化(每个batch初始化一次)
optimizer.zero_grad()
#正向传播:模型由输入预测输出
output=model(data_input)
#计算loss
loss=loss_fn(output,true_label)
#反向传播:计算当前batch的loss的梯度
loss.backward()
#由求解器根据梯度更新模型参数
optimizer.step()

#间隔性的输出当前batch的训练loss
ifbatch_idx%LOG_INTERVAL==0:
print('TrainEpoch:{}[{}/{}({:.0f}%)]\tLoss:{:.6f}'.format(
epoch,batch_idx*len(data_input),len(train_loader.dataset),
100.*batch_idx/len(train_loader),loss.item()))


#计算在测试集的准确率和loss
deftest(model,loss_fn,device,test_loader):
model.eval()
test_loss=0
correct=0
withtorch.no_grad():
fordata,targetintest_loader:
data,target=data.to(device),target.to(device)
output=model(data)
#sumupbatchloss
test_loss+=loss_fn(output,target,reduction='sum').item()
#gettheindexofthemaxlog-probability
pred=output.argmax(dim=1,keepdim=True)
correct+=pred.eq(target.view_as(pred)).sum().item()

test_loss/=len(test_loader.dataset)

print('\nTestset:Averageloss:{:.4f},Accuracy:{}/{}({:.0f}%)\n'.format(
test_loss,correct,len(test_loader.dataset),
100.*correct/len(test_loader.dataset)))


defmain():
#检查是否有GPU
use_cuda=torch.cuda.is_available()

#设置随机种子(以保证结果可复现)
torch.manual_seed(SEED)

#训练设备(GPU或CPU)
device=torch.device("cuda"ifuse_cudaelse"cpu")

#设置batchsize
train_kwargs={'batch_size':BATCH_SIZE}
test_kwargs={'batch_size':TEST_BACTH_SIZE}

ifuse_cuda:
cuda_kwargs={'num_workers':1,
'pin_memory':True,
'shuffle':True}
train_kwargs.update(cuda_kwargs)
test_kwargs.update(cuda_kwargs)

#数据预处理(转tensor、数值归一化)
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,),(0.3081,))
])

#自动下载MNIST数据集
dataset_train=datasets.MNIST('data',train=True,download=True,
transform=transform)
dataset_test=datasets.MNIST('data',train=False,
transform=transform)

#定义数据加载器(自动对数据加载、多线程、随机化、划分batch、等等)
train_loader=torch.utils.data.DataLoader(dataset_train,**train_kwargs)
test_loader=torch.utils.data.DataLoader(dataset_test,**test_kwargs)

#创建神经网络模型
model=FeedForwardNet().to(device)

#指定求解器
optimizer=optim.SGD(model.parameters(),lr=LR)
#scheduler=StepLR(optimizer,step_size=1,gamma=0.9)

#定义loss函数
#注:nll 作用于 log_softmax 等价于交叉熵,感兴趣的同学可以自行推导
#https://blog.csdn.net/weixin_38145317/article/details/103288032
loss_fn=F.nll_loss

#训练N个epoch
forepochinrange(1,EPOCHS+1):
train(model,loss_fn,device,train_loader,optimizer,epoch)
test(model,loss_fn,device,test_loader)
#scheduler.step()


if__name__=='__main__':
main()
/opt/conda/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
Train Epoch: 1 [0/60000 (0%)]	Loss: 2.287443
Train Epoch: 1 [6400/60000 (11%)] Loss: 2.284967
Train Epoch: 1 [12800/60000 (21%)] Loss: 2.273498
Train Epoch: 1 [19200/60000 (32%)] Loss: 2.022655
Train Epoch: 1 [25600/60000 (43%)] Loss: 1.680803
Train Epoch: 1 [32000/60000 (53%)] Loss: 1.322924
Train Epoch: 1 [38400/60000 (64%)] Loss: 0.978875
Train Epoch: 1 [44800/60000 (75%)] Loss: 0.955985
Train Epoch: 1 [51200/60000 (85%)] Loss: 0.670422
Train Epoch: 1 [57600/60000 (96%)] Loss: 0.821590

Test set: Average loss: 0.5133, Accuracy: 8522/10000 (85%)

Train Epoch: 2 [0/60000 (0%)] Loss: 0.740346
Train Epoch: 2 [6400/60000 (11%)] Loss: 0.697988
Train Epoch: 2 [12800/60000 (21%)] Loss: 0.676830
Train Epoch: 2 [19200/60000 (32%)] Loss: 0.531716
Train Epoch: 2 [25600/60000 (43%)] Loss: 0.457828
Train Epoch: 2 [32000/60000 (53%)] Loss: 0.621303
Train Epoch: 2 [38400/60000 (64%)] Loss: 0.354285
Train Epoch: 2 [44800/60000 (75%)] Loss: 0.588098
Train Epoch: 2 [51200/60000 (85%)] Loss: 0.530143
Train Epoch: 2 [57600/60000 (96%)] Loss: 0.533157

Test set: Average loss: 0.3203, Accuracy: 9035/10000 (90%)

Train Epoch: 3 [0/60000 (0%)] Loss: 0.425095
Train Epoch: 3 [6400/60000 (11%)] Loss: 0.301024
Train Epoch: 3 [12800/60000 (21%)] Loss: 0.330063
Train Epoch: 3 [19200/60000 (32%)] Loss: 0.362905
Train Epoch: 3 [25600/60000 (43%)] Loss: 0.387243
Train Epoch: 3 [32000/60000 (53%)] Loss: 0.436325
Train Epoch: 3 [38400/60000 (64%)] Loss: 0.266472
Train Epoch: 3 [44800/60000 (75%)] Loss: 0.463275
Train Epoch: 3 [51200/60000 (85%)] Loss: 0.264305
Train Epoch: 3 [57600/60000 (96%)] Loss: 0.480805

Test set: Average loss: 0.2456, Accuracy: 9262/10000 (93%)

Train Epoch: 4 [0/60000 (0%)] Loss: 0.343381
Train Epoch: 4 [6400/60000 (11%)] Loss: 0.222288
Train Epoch: 4 [12800/60000 (21%)] Loss: 0.200421
Train Epoch: 4 [19200/60000 (32%)] Loss: 0.301372
Train Epoch: 4 [25600/60000 (43%)] Loss: 0.282800
Train Epoch: 4 [32000/60000 (53%)] Loss: 0.424678
Train Epoch: 4 [38400/60000 (64%)] Loss: 0.160868
Train Epoch: 4 [44800/60000 (75%)] Loss: 0.373828
Train Epoch: 4 [51200/60000 (85%)] Loss: 0.273351
Train Epoch: 4 [57600/60000 (96%)] Loss: 0.498258

Test set: Average loss: 0.2007, Accuracy: 9388/10000 (94%)

Train Epoch: 5 [0/60000 (0%)] Loss: 0.175644
Train Epoch: 5 [6400/60000 (11%)] Loss: 0.349571
Train Epoch: 5 [12800/60000 (21%)] Loss: 0.231020
Train Epoch: 5 [19200/60000 (32%)] Loss: 0.277835
Train Epoch: 5 [25600/60000 (43%)] Loss: 0.248639
Train Epoch: 5 [32000/60000 (53%)] Loss: 0.338623
Train Epoch: 5 [38400/60000 (64%)] Loss: 0.174397
Train Epoch: 5 [44800/60000 (75%)] Loss: 0.384121
Train Epoch: 5 [51200/60000 (85%)] Loss: 0.238978
Train Epoch: 5 [57600/60000 (96%)] Loss: 0.279425

Test set: Average loss: 0.1718, Accuracy: 9461/10000 (95%)

Train Epoch: 6 [0/60000 (0%)] Loss: 0.146129
Train Epoch: 6 [6400/60000 (11%)] Loss: 0.270161
Train Epoch: 6 [12800/60000 (21%)] Loss: 0.157764
Train Epoch: 6 [19200/60000 (32%)] Loss: 0.274829
Train Epoch: 6 [25600/60000 (43%)] Loss: 0.224544
Train Epoch: 6 [32000/60000 (53%)] Loss: 0.273076
Train Epoch: 6 [38400/60000 (64%)] Loss: 0.091917
Train Epoch: 6 [44800/60000 (75%)] Loss: 0.318671
Train Epoch: 6 [51200/60000 (85%)] Loss: 0.220927
Train Epoch: 6 [57600/60000 (96%)] Loss: 0.353987

Test set: Average loss: 0.1518, Accuracy: 9534/10000 (95%)

Train Epoch: 7 [0/60000 (0%)] Loss: 0.169520
Train Epoch: 7 [6400/60000 (11%)] Loss: 0.189826
Train Epoch: 7 [12800/60000 (21%)] Loss: 0.139019
Train Epoch: 7 [19200/60000 (32%)] Loss: 0.231475
Train Epoch: 7 [25600/60000 (43%)] Loss: 0.213273
Train Epoch: 7 [32000/60000 (53%)] Loss: 0.326085
Train Epoch: 7 [38400/60000 (64%)] Loss: 0.124614
Train Epoch: 7 [44800/60000 (75%)] Loss: 0.314796
Train Epoch: 7 [51200/60000 (85%)] Loss: 0.173023
Train Epoch: 7 [57600/60000 (96%)] Loss: 0.272714

Test set: Average loss: 0.1353, Accuracy: 9580/10000 (96%)

Train Epoch: 8 [0/60000 (0%)] Loss: 0.139128
Train Epoch: 8 [6400/60000 (11%)] Loss: 0.144777
Train Epoch: 8 [12800/60000 (21%)] Loss: 0.102367
Train Epoch: 8 [19200/60000 (32%)] Loss: 0.192136
Train Epoch: 8 [25600/60000 (43%)] Loss: 0.141218
Train Epoch: 8 [32000/60000 (53%)] Loss: 0.234929
Train Epoch: 8 [38400/60000 (64%)] Loss: 0.103180
Train Epoch: 8 [44800/60000 (75%)] Loss: 0.320878
Train Epoch: 8 [51200/60000 (85%)] Loss: 0.156626
Train Epoch: 8 [57600/60000 (96%)] Loss: 0.349143

Test set: Average loss: 0.1235, Accuracy: 9614/10000 (96%)

Train Epoch: 9 [0/60000 (0%)] Loss: 0.136772
Train Epoch: 9 [6400/60000 (11%)] Loss: 0.161713
Train Epoch: 9 [12800/60000 (21%)] Loss: 0.130365
Train Epoch: 9 [19200/60000 (32%)] Loss: 0.111859
Train Epoch: 9 [25600/60000 (43%)] Loss: 0.216221
Train Epoch: 9 [32000/60000 (53%)] Loss: 0.166046
Train Epoch: 9 [38400/60000 (64%)] Loss: 0.077858
Train Epoch: 9 [44800/60000 (75%)] Loss: 0.263344
Train Epoch: 9 [51200/60000 (85%)] Loss: 0.153452
Train Epoch: 9 [57600/60000 (96%)] Loss: 0.358510

Test set: Average loss: 0.1135, Accuracy: 9640/10000 (96%)

Train Epoch: 10 [0/60000 (0%)] Loss: 0.091045
Train Epoch: 10 [6400/60000 (11%)] Loss: 0.185156
Train Epoch: 10 [12800/60000 (21%)] Loss: 0.098523
Train Epoch: 10 [19200/60000 (32%)] Loss: 0.213850
Train Epoch: 10 [25600/60000 (43%)] Loss: 0.112603
Train Epoch: 10 [32000/60000 (53%)] Loss: 0.254803
Train Epoch: 10 [38400/60000 (64%)] Loss: 0.074294
Train Epoch: 10 [44800/60000 (75%)] Loss: 0.200418
Train Epoch: 10 [51200/60000 (85%)] Loss: 0.162135
Train Epoch: 10 [57600/60000 (96%)] Loss: 0.328646

Test set: Average loss: 0.1075, Accuracy: 9678/10000 (97%)

Train Epoch: 11 [0/60000 (0%)] Loss: 0.110555
Train Epoch: 11 [6400/60000 (11%)] Loss: 0.185066
Train Epoch: 11 [12800/60000 (21%)] Loss: 0.154370
Train Epoch: 11 [19200/60000 (32%)] Loss: 0.187331
Train Epoch: 11 [25600/60000 (43%)] Loss: 0.103054
Train Epoch: 11 [32000/60000 (53%)] Loss: 0.097087
Train Epoch: 11 [38400/60000 (64%)] Loss: 0.094759
Train Epoch: 11 [44800/60000 (75%)] Loss: 0.254946
Train Epoch: 11 [51200/60000 (85%)] Loss: 0.163511
Train Epoch: 11 [57600/60000 (96%)] Loss: 0.375709

Test set: Average loss: 0.1001, Accuracy: 9702/10000 (97%)

Train Epoch: 12 [0/60000 (0%)] Loss: 0.085661
Train Epoch: 12 [6400/60000 (11%)] Loss: 0.267067
Train Epoch: 12 [12800/60000 (21%)] Loss: 0.162384
Train Epoch: 12 [19200/60000 (32%)] Loss: 0.181441
Train Epoch: 12 [25600/60000 (43%)] Loss: 0.109263
Train Epoch: 12 [32000/60000 (53%)] Loss: 0.194257
Train Epoch: 12 [38400/60000 (64%)] Loss: 0.065200
Train Epoch: 12 [44800/60000 (75%)] Loss: 0.288888
Train Epoch: 12 [51200/60000 (85%)] Loss: 0.167924
Train Epoch: 12 [57600/60000 (96%)] Loss: 0.311067

Test set: Average loss: 0.0956, Accuracy: 9722/10000 (97%)

Train Epoch: 13 [0/60000 (0%)] Loss: 0.093631
Train Epoch: 13 [6400/60000 (11%)] Loss: 0.079958
Train Epoch: 13 [12800/60000 (21%)] Loss: 0.143489
Train Epoch: 13 [19200/60000 (32%)] Loss: 0.087933
Train Epoch: 13 [25600/60000 (43%)] Loss: 0.094754
Train Epoch: 13 [32000/60000 (53%)] Loss: 0.132700
Train Epoch: 13 [38400/60000 (64%)] Loss: 0.060542
Train Epoch: 13 [44800/60000 (75%)] Loss: 0.212000
Train Epoch: 13 [51200/60000 (85%)] Loss: 0.092904
Train Epoch: 13 [57600/60000 (96%)] Loss: 0.243191

Test set: Average loss: 0.0912, Accuracy: 9739/10000 (97%)

Train Epoch: 14 [0/60000 (0%)] Loss: 0.036139
Train Epoch: 14 [6400/60000 (11%)] Loss: 0.185927
Train Epoch: 14 [12800/60000 (21%)] Loss: 0.094324
Train Epoch: 14 [19200/60000 (32%)] Loss: 0.129941
Train Epoch: 14 [25600/60000 (43%)] Loss: 0.099867
Train Epoch: 14 [32000/60000 (53%)] Loss: 0.175463
Train Epoch: 14 [38400/60000 (64%)] Loss: 0.075817
Train Epoch: 14 [44800/60000 (75%)] Loss: 0.191533
Train Epoch: 14 [51200/60000 (85%)] Loss: 0.105866
Train Epoch: 14 [57600/60000 (96%)] Loss: 0.255987

Test set: Average loss: 0.0875, Accuracy: 9751/10000 (98%)

Train Epoch: 15 [0/60000 (0%)] Loss: 0.049678
Train Epoch: 15 [6400/60000 (11%)] Loss: 0.185887
Train Epoch: 15 [12800/60000 (21%)] Loss: 0.064999
Train Epoch: 15 [19200/60000 (32%)] Loss: 0.106103
Train Epoch: 15 [25600/60000 (43%)] Loss: 0.052213
Train Epoch: 15 [32000/60000 (53%)] Loss: 0.136051
Train Epoch: 15 [38400/60000 (64%)] Loss: 0.037222
Train Epoch: 15 [44800/60000 (75%)] Loss: 0.152106
Train Epoch: 15 [51200/60000 (85%)] Loss: 0.140182
Train Epoch: 15 [57600/60000 (96%)] Loss: 0.216424

Test set: Average loss: 0.0850, Accuracy: 9757/10000 (98%)

如何运行这段代码:

  1. 不要在Jupyter笔记上直接运行

  2. 请将左侧的 `experiments/mnist/train.py` 文件下载到本地

  3. 安装相关依赖包: pip install torch torchvision

  4. 运行:python3 train.py

尝试

在 HuggingFace 上找一个简单的数据集,自己实现一个训练过程







欢迎光临 链载Ai (https://www.lianzai.com/) Powered by Discuz! X3.5