Large language models (LLMs) have shown impressive performance on downstream tasks by in-context learning (ICL), which heavily relies on the quality of demonstrations selected from a large set of annotated examples. Recent works claim that in-context learning is robust to noisy demonstrations in text classification. In this work, we show that, on text generation tasks, noisy annotations significantly hurt the performance of in-context learning. To circumvent the issue, we propose a simple and effective approach called Local Perplexity Ranking (LPR), which replaces the "noisy" candidates with their nearest neighbors that are more likely to be clean. Our method is motivated by analyzing the perplexity deviation caused by noisy labels and decomposing perplexity into inherent perplexity and matching perplexity. Our key idea behind LPR is thus to decouple the matching perplexity by performing the ranking among the neighbors in semantic space. Our approach can prevent the selected demonstrations from including mismatched input-label pairs while preserving the effectiveness of the original selection methods. Extensive experiments demonstrate the effectiveness of LPR, improving the EM score by up to 18.75 on common benchmarks with noisy annotations.
翻译:大型语言模型(LLMs)通过上下文学习(ICL)在下游任务中展现出卓越性能,其效果高度依赖于从大量标注样本中选取的示例质量。近期研究声称上下文学习对文本分类任务中的噪声示例具有鲁棒性。本研究表明,在文本生成任务中,噪声标注会显著损害上下文学习的性能。为解决该问题,我们提出了一种简单有效的方法——局部困惑度排序(LPR),该方法将“噪声”候选样本替换为其语义空间中更可能是干净样本的最近邻。我们的方法源于对噪声标签引起的困惑度偏差分析,并将困惑度分解为固有困惑度与匹配困惑度。LPR的核心思想是通过在语义空间中对邻近样本进行排序来解耦匹配困惑度。该方法既能防止所选示例包含不匹配的输入-标签对,又能保持原始选择方法的有效性。大量实验证明LPR的有效性,在带有噪声标注的常用基准测试中,其EM分数最高可提升18.75分。