In-context Learning (ICL) empowers large language models (LLMs) to swiftly adapt to unseen tasks at inference-time by prefixing a few demonstration examples before queries. Despite its versatility, ICL incurs substantial computational and memory overheads compared to zero-shot learning and is sensitive to the selection and order of demonstration examples. In this work, we introduce Implicit In-context Learning (I2CL), an innovative paradigm that reduces the inference cost of ICL to that of zero-shot learning with minimal information loss. I2CL operates by first generating a condensed vector representation, namely a context vector, extracted from the demonstration examples. It then conducts an inference-time intervention through injecting a linear combination of the context vector and query activations back into the model's residual streams. Empirical evaluation on nine real-world tasks across three model architectures demonstrates that I2CL achieves few-shot level performance at zero-shot inference cost, and it exhibits robustness against variations in demonstration examples. Furthermore, I2CL facilitates a novel representation of task-ids, enhancing task similarity detection and fostering effective transfer learning. We also perform a comprehensive analysis and ablation study on I2CL, offering deeper insights into its internal mechanisms. Code is available at https://github.com/LzVv123456/I2CL.
翻译:上下文学习(ICL)使大型语言模型(LLM)能够在推理时通过将少量演示示例置于查询之前,快速适应未见过的任务。尽管其功能多样,但与零样本学习相比,ICL会产生大量的计算和内存开销,并且对演示示例的选择和顺序敏感。在这项工作中,我们引入了隐式上下文学习(I2CL),这是一种创新的范式,能够以最小的信息损失将ICL的推理成本降低至零样本学习的水平。I2CL首先从演示示例中提取一个压缩的向量表示,即上下文向量,然后通过在模型的残差流中注入上下文向量与查询激活的线性组合来进行推理时干预。在三种模型架构上对九个现实世界任务进行的实证评估表明,I2CL能够以零样本推理成本实现少样本级别的性能,并且对演示示例的变化表现出鲁棒性。此外,I2CL促进了一种新颖的任务标识表示,增强了任务相似性检测并促进了有效的迁移学习。我们还对I2CL进行了全面的分析和消融研究,以更深入地理解其内部机制。代码可在 https://github.com/LzVv123456/I2CL 获取。