Large language models (LLMs) have recently shown great potential for in-context learning, where LLMs learn a new task simply by conditioning on a few input-label pairs (prompts). Despite their potential, our understanding of the factors influencing end-task performance and the robustness of in-context learning remains limited. This paper aims to bridge this knowledge gap by investigating the reliance of LLMs on shortcuts or spurious correlations within prompts. Through comprehensive experiments on classification and extraction tasks, we reveal that LLMs are "lazy learners" that tend to exploit shortcuts in prompts for downstream tasks. Additionally, we uncover a surprising finding that larger models are more likely to utilize shortcuts in prompts during inference. Our findings provide a new perspective on evaluating robustness in in-context learning and pose new challenges for detecting and mitigating the use of shortcuts in prompts.
翻译:大型语言模型(LLMs)近期在上下文学习方面展现出巨大潜力,即仅需通过少量输入-标签对(提示)即可习得新任务。尽管潜力可观,但我们对影响最终任务性能的因素以及上下文学习鲁棒性的认识仍十分有限。本文旨在通过研究LLMs对提示中捷径或虚假关联的依赖来填补这一认知空白。通过分类与抽取任务的综合实验,我们揭示LLMs实为倾向于利用提示中捷径完成下游任务的“懒惰学习者”。此外,我们获得一项惊人发现:参数量越大的模型在推理时越可能利用提示中的捷径。这些发现为评估上下文学习的鲁棒性提供了新视角,也为检测与缓解提示中捷径的使用带来了新挑战。