In this paper, by treating in-context learning (ICL) as a meta-optimization process, we explain why LLMs are sensitive to the order of ICL examples. This understanding leads us to the development of Batch-ICL, an effective, efficient, and order-agnostic inference algorithm for ICL. Differing from the standard N-shot learning approach, Batch-ICL employs $N$ separate 1-shot forward computations and aggregates the resulting meta-gradients. These aggregated meta-gradients are then applied to the forward computation of a zero-shot query to generate the final prediction. This batch processing approach renders the LLM agnostic to the order of ICL examples. Through extensive experiments and analysis, we demonstrate that Batch-ICL consistently outperforms most permutations of ICL examples. In some cases, it even exceeds the performance of the best order for standard ICL, all while reducing the computational resources required. Furthermore, we develop a novel variant of Batch-ICL featuring multiple "epochs" of meta-optimization. This variant implicitly explores permutations of ICL examples, further enhancing ICL performance.
翻译:本文通过将上下文学习(ICL)视为元优化过程,解释了大型语言模型为何对ICL示例顺序敏感。基于这一理解,我们提出了Batch-ICL——一种高效、有效且顺序无关的ICL推理算法。与标准N样本学习方法不同,Batch-ICL执行N次独立的1样本前向计算并聚合生成的元梯度,随后将这些聚合元梯度应用于零样本查询的前向计算,以生成最终预测。这种批处理方法使得LLM对ICL示例顺序不再敏感。通过大量实验与分析,我们证明Batch-ICL在多数ICL示例排列中表现始终优于基准方法,甚至在部分情况下超过标准ICL最优顺序的性能,同时降低了计算资源需求。此外,我们开发了一种集成多"轮次"元优化的新颖Batch-ICL变体,该变体隐式探索ICL示例排列,进一步提升了ICL性能。