In-Context Learning (ICL), which formulates target tasks as prompt completion conditioned on in-context demonstrations, has become the prevailing utilization of LLMs. In this paper, we first disclose an actual predicament for this typical usage that it can not scale up with training data due to context length restriction. Besides, existing works have shown that ICL also suffers from various biases and requires delicate calibration treatment. To address both challenges, we advocate a simple and effective solution, $k$NN Prompting, which first queries LLM with training data for distributed representations, then predicts test instances by simply referring to nearest neighbors. We conduct comprehensive experiments to demonstrate its two-fold superiority: 1) Calibration-Free: $k$NN Prompting does not directly align LLM output distribution with task-specific label space, instead leverages such distribution to align test and training instances. It significantly outperforms state-of-the-art calibration-based methods under comparable few-shot scenario. 2) Beyond-Context: $k$NN Prompting can further scale up effectively with as many training data as are available, continually bringing substantial improvements. The scaling trend holds across 10 orders of magnitude ranging from 2 shots to 1024 shots as well as different LLMs scales ranging from 0.8B to 30B. It successfully bridges data scaling into model scaling, and brings new potentials for the gradient-free paradigm of LLM deployment. Code is publicly available.
翻译:上下文学习(In-Context Learning, ICL)将目标任务转化为基于上下文示例的提示补全任务,已成为大语言模型(LLM)的主流应用范式。本文首先揭示该典型用法面临的实际困境:由于上下文长度限制,其无法随训练数据规模扩展。此外,现有研究表明ICL存在多种偏差问题,需要精细的校准处理。为解决上述挑战,我们提出一种简洁高效的解决方案——$k$NN Prompting。该方法首先利用训练数据查询LLM获取分布式表征,随后通过简单参考最近邻来预测测试实例。我们通过全面实验证明其双重优势:1)无校准性:$k$NN Prompting不直接对齐LLM输出分布与任务特定标签空间,而是利用该分布对齐测试实例与训练实例。在可比小样本场景下,其显著优于现有最先进的基于校准的方法。2)超越上下文:$k$NN Prompting可随可用训练数据规模有效扩展,持续带来显著性能提升。该扩展趋势在2-shot至1024-shot的10个数量级范围,以及0.8B至30B的LLM参数量级中均保持一致。该方法成功将数据扩展与模型扩展相融合,为LLM部署的无梯度范式开辟了新可能。代码已开源。