In-context Learning (ICL) empowers large language models (LLMs) to adapt to unseen tasks during inference by prefixing a few demonstration examples prior to test queries. Despite its versatility, ICL incurs substantial computational and memory overheads compared to zero-shot learning and is susceptible to the selection and order of demonstration examples. In this work, we introduce Implicit In-context Learning (I2CL), an innovative paradigm that addresses the challenges associated with traditional ICL by absorbing demonstration examples within the activation space. I2CL first generates a condensed vector representation, namely a context vector, from the demonstration examples. It then integrates the context vector during inference by injecting a linear combination of the context vector and query activations into the model's residual streams. Empirical evaluation on nine real-world tasks across three model architectures demonstrates that I2CL achieves few-shot performance with zero-shot cost and exhibits robustness against the variation of demonstration examples. Furthermore, I2CL facilitates a novel representation of "task-ids", enhancing task similarity detection and enabling effective transfer learning. We provide a comprehensive analysis of I2CL, offering deeper insights into its mechanisms and broader implications for ICL. The source code is available at: https://github.com/LzVv123456/I2CL.
翻译:上下文学习(ICL)使大型语言模型(LLM)能够在推理阶段通过为测试查询前缀少量演示示例来适应未见过的任务。尽管其功能多样,但与零样本学习相比,ICL会产生显著的计算和内存开销,并且对演示示例的选择和顺序敏感。本文提出隐式上下文学习(I2CL),这是一种创新范式,通过在激活空间中吸收演示示例来解决传统ICL相关的挑战。I2CL首先从演示示例生成一个压缩的向量表示,即上下文向量,随后在推理过程中通过将上下文向量与查询激活的线性组合注入模型的残差流来实现集成。在三种模型架构上对九个真实世界任务的实证评估表明,I2CL能够以零样本成本实现少样本性能,并对演示示例的变动表现出鲁棒性。此外,I2CL促进了一种新颖的“任务标识符”表示,增强了任务相似性检测能力,并支持有效的迁移学习。我们对I2CL进行了全面分析,为其机制和更广泛的ICL影响提供了更深入的见解。源代码发布于:https://github.com/LzVv123456/I2CL。