Large language models (LLMs) have exhibited an emergent in-context learning (ICL) ability. However, the ICL models that can solve ordinary cases are hardly extended to solve more complex tasks by processing the demonstration examples once. This single-turn ICL is incoordinate with the decision making process of humans by learning from analogy. In this paper, we propose an effective and efficient two-stage framework to boost ICL in LLMs by exploiting a dual form between Transformer attention and gradient descent-based optimization. Concretely, we divide the ICL process into "Deep-Thinking" and inference stages. The "Deep-Thinking" stage performs iterative forward optimization of demonstrations, which is expected to boost the reasoning abilities of LLMs at test time by "thinking" demonstrations multiple times. It produces accumulated meta-gradients by manipulating the Key-Value matrices in the self-attention modules of the Transformer. Then, the inference stage only takes the test query as input without concatenating demonstrations and applies the learned meta-gradients through attention for output prediction. In this way, demonstrations are not required during the inference stage since they are already learned and stored in the definitive meta-gradients. LLMs can be effectively and efficiently adapted to downstream tasks. Extensive experiments on ten classification and multiple-choice datasets show that our method achieves substantially better performance than standard ICL in terms of both accuracy and efficiency.
翻译:大型语言模型(LLMs)已展现出涌现的上下文学习(ICL)能力。然而,能够处理常规案例的ICL模型很难通过一次性处理演示示例来扩展到解决更复杂的任务。这种单轮ICL与人类通过类比学习的决策过程不协调。本文提出了一种有效且高效的两阶段框架,通过利用Transformer注意力与基于梯度下降优化之间的对偶形式来提升LLMs中的ICL能力。具体而言,我们将ICL过程划分为“深度思考”阶段和推理阶段。“深度思考”阶段对演示示例进行迭代前向优化,期望通过多次“思考”演示来增强LLMs在测试时的推理能力。它通过操控Transformer自注意力模块中的键-值矩阵产生累积的元梯度。随后,推理阶段仅将测试查询作为输入,无需拼接演示示例,并通过注意力应用学习到的元梯度进行输出预测。通过这种方式,推理阶段不再需要演示示例,因为它们已被学习并存储在确定的元梯度中。LLMs能够高效且有效地适应下游任务。在十个分类和多选题数据集上的大量实验表明,我们的方法在准确性和效率上均显著优于标准ICL方法。