Large Language Models (LLMs) have recently gained the In-Context Learning (ICL) ability with the models scaling up, allowing them to quickly adapt to downstream tasks with only a few demonstration examples prepended in the input sequence. Nonetheless, the current practice of ICL treats all demonstration examples equally, which still warrants improvement, as the quality of examples is usually uneven. In this paper, we investigate how to determine approximately optimal weights for demonstration examples and how to apply them during ICL. To assess the quality of weights in the absence of additional validation data, we design a masked self-prediction (MSP) score that exhibits a strong correlation with the final ICL performance. To expedite the weight-searching process, we discretize the continuous weight space and adopt beam search. With approximately optimal weights obtained, we further propose two strategies to apply them to demonstrations at different model positions. Experimental results on 8 text classification tasks show that our approach outperforms conventional ICL by a large margin. Our code are publicly available at https:github.com/Zhe-Young/WICL.
翻译:大型语言模型(LLMs)近期随着模型规模的扩大而获得了上下文学习(ICL)能力,这使得它们能够仅通过在输入序列前添加少量演示示例,即可快速适应下游任务。然而,当前ICL实践将所有演示示例一视同仁,这仍有改进空间,因为示例质量通常参差不齐。本文研究了如何为演示示例确定近似最优权重,以及如何在ICL过程中应用这些权重。为在缺乏额外验证数据的情况下评估权重质量,我们设计了一种掩码自预测(MSP)评分,该评分与最终ICL性能呈强相关性。为加速权重搜索过程,我们对连续权重空间进行离散化处理,并采用束搜索算法。在获得近似最优权重后,我们进一步提出两种策略,将其应用于模型不同位置的演示示例。在8个文本分类任务上的实验结果表明,我们的方法在性能上大幅超越了传统ICL。我们的代码已公开于https://github.com/Zhe-Young/WICL。