In-context learning (ICL) adapts LLMs by providing demonstrations without fine-tuning the model parameters; however, it does not differentiate between demonstrations and quadratically increases the complexity of Transformer LLMs, exhausting the memory. As a solution, we propose Mixtures of In-Context Learners (MoICL), a novel approach to treat subsets of demonstrations as experts and learn a weighting function to merge their output distributions based on a training set. In our experiments, we show performance improvements on 5 out of 7 classification datasets compared to a set of strong baselines (up to +13\% compared to ICL and LENS). Moreover, we enhance the Pareto frontier of ICL by reducing the inference time needed to achieve the same performance with fewer demonstrations. Finally, MoICL is more robust to out-of-domain (up to +11\%), imbalanced (up to +49\%), or noisy demonstrations (up to +38\%) or can filter these out from datasets. Overall, MoICL is a more expressive approach to learning from demonstrations without exhausting the context window or memory.
翻译:上下文学习(ICL)通过提供演示示例来适配大语言模型,而无需微调模型参数;然而,该方法未对演示示例进行区分,且使Transformer大语言模型的复杂度呈二次增长,从而耗尽内存。作为解决方案,我们提出上下文学习混合模型(MoICL),这是一种新颖的方法,将演示示例的子集视为专家,并基于训练集学习一个加权函数以融合其输出分布。在我们的实验中,与一组强基线相比(相较于ICL和LENS最高提升+13%),我们在7个分类数据集中的5个上展示了性能改进。此外,我们通过减少达到相同性能所需的推理时间和更少的演示示例,提升了ICL的帕累托前沿。最后,MoICL对域外(最高+11%)、不平衡(最高+49%)或含噪声的演示示例(最高+38%)表现出更强的鲁棒性,或能够从数据集中过滤掉这些示例。总体而言,MoICL是一种更具表达力的方法,能够在不耗尽上下文窗口或内存的情况下从演示示例中学习。