In-context learning (ICL) has revolutionized the capabilities of transformer models in NLP. In our project, we extend the understanding of the mechanisms underpinning ICL by exploring whether transformers can learn from sequential, non-textual function class data distributions. We introduce a novel sliding window sequential function class and employ toy-sized transformers with a GPT-2 architecture to conduct our experiments. Our analysis indicates that these models can indeed leverage ICL when trained on non-textual sequential function classes. Additionally, our experiments with randomized y-label sequences highlights that transformers retain some ICL capabilities even when the label associations are obfuscated. We provide evidence that transformers can reason with and understand sequentiality encoded within function classes, as reflected by the effective learning of our proposed tasks. Our results also show that the performance deteriorated with increasing randomness in the labels, though not to the extent one might expect, implying a potential robustness of learned sequentiality against label noise. Future research may want to look into how previous explanations of transformers, such as induction heads and task vectors, relate to sequentiality in ICL in these toy examples. Our investigation lays the groundwork for further research into how transformers process and perceive sequential data.
翻译:上下文学习(ICL)已彻底改变了Transformer模型在自然语言处理中的能力。在本项目中,我们通过探究Transformer能否从序列化、非文本的函数类数据分布中进行学习,进一步深化了对ICL机制的理解。我们引入了一种新颖的滑动窗口序列函数类,并采用基于GPT-2架构的微型Transformer开展实验。分析表明,这些模型在非文本序列函数类上训练时确实能够利用ICL。此外,针对随机化y标签序列的实验显示,即使标签关联被混淆,Transformer仍保留部分ICL能力。我们提供了证据,证明Transformer能够推理并理解函数类中编码的序列性,这通过我们提出任务的有效学习得以体现。结果还表明,随着标签随机性增加,模型性能虽有所下降,但并未达到预期程度,暗示习得的序列性对标签噪声具有潜在鲁棒性。未来研究可进一步探讨现有对Transformer的解释(如归纳头与任务向量)如何与这些简易示例中ICL的序列性相关联。本项工作为深入研究Transformer如何处理和理解序列数据奠定了基础。