“language modeling performance improves smoothly and predictably
“参数多的模型比参数小的模型更能高效地学习,达到相同性能所需的训练数据和所需的步数更少Sample efficiency