In-Context Learning (ICL) indicates that large language models (LLMs) pretrained on a massive amount of data can learn specific tasks from input prompts' examples. ICL is notable for two reasons. First, it does not need modification of LLMs' internal structure. Second, it enables LLMs to perform a wide range of tasks/functions with a few examples demonstrating a desirable task. ICL opens up new ways to utilize LLMs in more domains, but its underlying mechanisms still remain poorly understood, making error correction and diagnosis extremely challenging. Thus, it is imperative that we better understand the limitations of ICL and how exactly LLMs support ICL. Inspired by ICL properties and LLMs' functional modules, we propose 1the counting hypothesis' of ICL, which suggests that LLMs' encoding strategy may underlie ICL, and provide supporting evidence.
翻译:上下文学习(ICL)表明,在大规模数据上预训练的大型语言模型(LLM)能够从输入提示的示例中学习特定任务。ICL的显著特点有二:其一,它无需修改LLM的内部结构;其二,仅需少量展示目标任务的示例即可使LLM执行广泛的任务/功能。ICL为在更多领域应用LLM开辟了新途径,但其内在机制仍不甚明晰,这导致错误修正与诊断极为困难。因此,深入理解ICL的局限性与LLM支持ICL的具体机制至关重要。受ICL特性与LLM功能模块的启发,我们提出ICL的“计数假说”,该假说认为LLM的编码策略可能是ICL的基础机制,并提供了支持性证据。