Supervised Fine-Tuning (SFT) is used to specialize model behavior by training weights to produce intended target responses for queries. In contrast, In-Context Learning (ICL) adapts models during inference with instructions or demonstrations in the prompt. ICL can offer better generalizability and more calibrated responses compared to SFT in data scarce settings, at the cost of more inference compute. In this work, we ask the question: Can ICL's internal computations be used to improve the qualities of SFT? We first show that ICL and SFT produce distinct activation patterns, indicating that the two methods achieve adaptation through different functional mechanisms. Motivated by this observation and to use ICL's rich functionality, we introduce ICL Activation Alignment (IA2), a self-distillation technique which aims to replicate ICL's activation patterns in SFT models and incentivizes ICL-like internal reasoning. Performing IA2 as a priming step before SFT significantly improves the accuracy and calibration of model outputs, as shown by our extensive empirical results on 12 popular benchmarks and two model families. This finding is not only practically useful, but also offers a conceptual window into the inner mechanics of model adaptation.
翻译:监督微调(SFT)通过训练权重使模型针对查询生成预期目标响应,以专门化模型行为。相比之下,上下文学习(ICL)在推理阶段通过提示中的指令或示例实现模型自适应。在数据匮乏场景下,ICL相比SFT可提供更好的泛化能力和更可靠的响应校准,但代价是增加推理计算量。本研究提出如下问题:能否利用ICL的内部计算机制来提升SFT质量?我们首先证明ICL与SFT会产生不同的激活模式,表明这两种方法通过不同的功能机制实现自适应。基于这一发现并为了利用ICL的丰富功能,我们提出ICL激活对齐(IA2)——一种旨在SFT模型中复制ICL激活模式并激发类ICL内部推理的自蒸馏技术。在SFT前将IA2作为预初始化步骤执行,能够显著提升模型输出的准确性与校准度,我们在12个主流基准测试及两种模型家族上的广泛实验验证了该结论。该发现不仅具有实际应用价值,更为理解模型自适应的内在机理提供了概念性视角。