Transformer verbatim in-context retrieval across time and scale

To predict upcoming text, language models must in some cases retrieve in-context information verbatim. In this report, we investigated how the ability of language models to retrieve arbitrary in-context nouns developed during training (across time) and as language models trained on the same dataset increase in size (across scale). We then asked whether learning of in-context retrieval correlates with learning of more challenging zero-shot benchmarks. Furthermore, inspired by semantic effects in human short-term memory, we evaluated the retrieval with respect to a major semantic component of target nouns, namely whether they denote a concrete or abstract entity, as rated by humans. We show that verbatim in-context retrieval developed in a sudden transition early in the training process, after about 1% of the training tokens. This was observed across model sizes (from 14M and up to 12B parameters), and the transition occurred slightly later for the two smallest models. We further found that the development of verbatim in-context retrieval is positively correlated with the learning of zero-shot benchmarks. Around the transition point, all models showed the advantage of retrieving concrete nouns as opposed to abstract nouns. In all but two smallest models, the advantage dissipated away toward the end of training.

翻译：为了预测后续文本，语言模型在某些情况下必须逐字检索语境信息。本报告研究了语言模型检索任意语境名词的能力在训练过程中（跨时间）以及在同一数据集上训练的语言模型规模增大时（跨规模）如何发展。我们进一步探究了语境检索的学习是否与更具挑战性的零样本基准测试的学习相关。此外，受人类短期记忆中的语义效应启发，我们根据目标名词的主要语义成分（即人类评分中它们表示具体实体还是抽象实体）评估了检索性能。研究表明，逐字语境检索能力在训练过程早期（约训练1%的token后）会突然发生转变。这一现象在不同规模模型（参数量从1400万到120亿）中均被观察到，且两个最小模型的转变点略有延迟。我们进一步发现，逐字语境检索能力的发展与零样本基准测试的学习呈正相关。在转变点附近，所有模型均表现出检索具体名词相较于抽象名词的优势。除两个最小模型外，这种优势在训练后期均逐渐消失。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/