Many recent language models (LMs) are capable of in-context learning (ICL), manifested in the LMs' ability to perform a new task solely from natural-language instruction. Previous work curating in-context learners assumes that ICL emerges from a vast over-parametrization or the scale of multi-task training. However, recent theoretical work attributes the ICL ability to concept-dependent training data and creates functional in-context learners even in small-scale, synthetic settings. In this work, we practically explore this newly identified axis of ICL quality. We propose Concept-aware Training (CoAT), a framework for constructing training scenarios that make it beneficial for the LM to learn to utilize the analogical reasoning concepts from demonstrations. We find that by using CoAT, pre-trained transformers can learn to better utilise new latent concepts from demonstrations and that such ability makes ICL more robust to the functional deficiencies of the previous models. Finally, we show that concept-aware in-context learning is more effective for a majority of new tasks when compared to traditional instruction tuning, resulting in a performance comparable to the previous in-context learners using magnitudes of more training data.
翻译:许多近期语言模型(LMs)具备上下文学习(ICL)能力,表现为模型仅通过自然语言指令即可执行新任务。先前关于上下文学习的研究通常认为ICL源于模型的巨大过参数化或多任务训练的规模。然而,近期理论研究将ICL能力归因于概念依赖的训练数据,并证明即使在小规模合成环境中也能创建具备功能的上下文学习模型。本研究沿着这一新发现的研究方向进行实践探索,提出概念感知训练(CoAT)框架,该框架通过构建训练场景,使语言模型能够从示范中学习利用类比推理概念。我们发现,通过CoAT训练,预训练的Transformer模型能够更有效地从示范中学习新的潜在概念,这种能力使ICL对先前模型的功能缺陷具有更强的鲁棒性。最后,我们证明对于大多数新任务,概念感知上下文学习比传统指令微调更有效,其性能可与使用海量训练数据的传统上下文学习模型相媲美。