In-context learning has become a popular paradigm in natural language processing. However, its performance can be significantly influenced by the order of in-context demonstration examples. In this paper, we found that causal language models (CausalLMs) are more sensitive to this order compared to prefix language models (PrefixLMs). We attribute this phenomenon to the auto-regressive attention masks within CausalLMs, which restrict each token from accessing information from subsequent tokens. This results in different receptive fields for samples at different positions, thereby leading to representation disparities across positions. To tackle this challenge, we introduce an unsupervised fine-tuning method, termed the Information-Augmented and Consistency-Enhanced approach. This approach utilizes contrastive learning to align representations of in-context examples across different positions and introduces a consistency loss to ensure similar representations for inputs with different permutations. This enhances the model's predictive consistency across permutations. Experimental results on five benchmarks suggest that our proposed method can reduce the sensitivity of CausalLMs to the order of in-context examples and exhibit robust generalizability, particularly when demonstrations are sourced from a candidate pool different from that used in the training phase, or when the number of in-context examples differs from what is used during training.
翻译:上下文学习已成为自然语言处理中一种流行的范式。然而,其性能可能显著受到上下文演示示例顺序的影响。本文发现,与前缀语言模型相比,因果语言模型对此顺序更为敏感。我们将此现象归因于因果语言模型内部的自回归注意力掩码,该掩码限制每个词元访问后续词元的信息。这导致不同位置的样本具有不同的感受野,从而造成跨位置的表示差异。为应对这一挑战,我们提出了一种无监督微调方法,称为信息增强与一致性增强方法。该方法利用对比学习来对齐不同位置上下文示例的表示,并引入一致性损失以确保不同排列的输入具有相似的表示。这增强了模型在不同排列下的预测一致性。在五个基准测试上的实验结果表明,我们提出的方法能够降低因果语言模型对上下文示例顺序的敏感性,并展现出强大的泛化能力,特别是当演示示例来自与训练阶段不同的候选池,或当上下文示例数量与训练时所用数量不同时。