In this work, we propose a simple theoretical framework, Pelican Soup, aiming to better understand how pretraining allows LLMs to (1) generalize to unseen instructions and (2) perform in-context learning, even when the verbalizers are irrelevant to the task. To this end, in our framework, we introduce the notion of "knowledge base" and "reference-sense association" and a simple formalism for natural language processing tasks. Our framework demonstrates how linguistic, psychology, and philosophy studies can inform our understanding of the language model and is connected to several other existing theoretical results. As an illustration of the usage of our framework, we derive a bound on in-context learning loss with our framework. Finally, we support our framework with empirical experiments and provide possible future research directions.
翻译:本研究提出一个简单的理论框架——鹈鹕汤(Pelican Soup),旨在更好地理解预训练如何使大语言模型能够:(1)泛化至未见指令;(2)执行上下文学习,即使提示词与任务无关。为此,我们在框架中引入了“知识库”与“指称-意义关联”的概念,并构建了自然语言处理任务的简单形式化体系。本框架展示了语言学、心理学和哲学研究如何增进我们对语言模型的理解,并与多项现有理论成果建立联系。为说明框架的应用,我们基于该框架推导出上下文学习损失的界。最后,我们通过实证实验验证框架的有效性,并指出可能的未来研究方向。