Recent research has investigated the underlying mechanisms of in-context learning (ICL) both theoretically and empirically, often using data generated from simple function classes. However, the existing work often focuses on the sequence consisting solely of labeled examples, while in practice, labeled examples are typically accompanied by an instruction, providing some side information about the task. In this work, we propose ICL with hypothesis-class guidance (ICL-HCG), a novel synthetic data model for ICL where the input context consists of the literal description of a (finite) hypothesis class H and $(x,y)$ pairs from a hypothesis chosen from H. Under our framework ICL-HCG, we conduct extensive experiments to explore: (i) a variety of generalization abilities to new hypothesis classes; (ii) different model architectures; (iii) sample complexity; (iv) in-context data imbalance; (v) the role of instruction; and (vi) the effect of pretraining hypothesis diversity. As a result, we show that (a) Transformers can successfully learn ICL-HCG and generalize to unseen hypotheses and unseen hypothesis classes, and (b) compared with ICL without instruction, ICL-HCG achieves significantly higher accuracy, demonstrating the role of instructions.
翻译:近期研究从理论和实证两方面探讨了上下文学习(ICL)的内在机制,通常使用简单函数类生成的数据进行分析。然而,现有工作往往聚焦于仅包含标注示例的序列,而在实际应用中,标注示例通常伴随着任务指令,这些指令提供了关于任务的辅助信息。本研究提出基于假设类指导的上下文学习(ICL-HCG),这是一种新颖的ICL合成数据模型,其输入上下文包含有限假设类H的文字描述以及从H中选取的假设所生成的$(x,y)$数据对。在ICL-HCG框架下,我们通过大量实验系统探究了:(i)对新假设类的多种泛化能力;(ii)不同模型架构的适应性;(iii)样本复杂度特征;(iv)上下文数据不平衡的影响;(v)指令的作用机制;(vi)预训练假设多样性的效应。实验结果表明:(a)Transformer模型能够成功学习ICL-HCG任务,并对未见过的假设及假设类展现出泛化能力;(b)与无指令的ICL相比,ICL-HCG实现了显著更高的准确率,这证实了指令在上下文学习中的重要作用。