The Role of Feedback Alignment in Self-Distillation

Conditioning a language model on additional context, such as feedback on a previous attempt, typically improves its response. Self-distillation trains the model to retain this improvement when the context is not present. The method works by matching the model's output distribution under two settings: a student that sees only the question, and a self-teacher that also sees the context. What the model learns therefore depends on what context the self-teacher receives, yet the design of this context remains largely unexplored. We study context design for self-distillation by training a solver on feedback from a frozen critic. We compare three conditions: (i) a binary reward (GRPO), (ii) the reference solution, and (iii) a step-by-step critique aligned to the solver's reasoning trace. Step-aligned critique yields the largest gains, outperforming GRPO by 16.11 points and reference-solution-conditioned self-distillation by 5.27 points (Avg@12). Per-token advantage analysis reveals why: step-aligned feedback targets only the tokens where reasoning fails, leaving correct behavior intact. Conditioning on the reference solution, by contrast, pressures the model to change its behavior at every token (even correct steps) because an alternative derivation inevitably differs in phrasing and approach. This suggests that structural alignment between feedback and the solver's reasoning is a key driver of self-distillation effectiveness.

翻译：将语言模型置于额外上下文（如对先前尝试的反馈）中通常能提升其响应质量。自蒸馏通过训练模型在缺乏上下文时保留这种改进效果。该方法通过匹配模型在两种设置下的输出分布来实现：仅看到问题的学生模型，以及同时看到上下文的自教师模型。因此，模型所学内容取决于自教师接收的上下文，然而上下文的设置尚未得到充分探索。我们通过训练求解器接收冻结评判器的反馈来研究自蒸馏的上下文设计。我们比较了三种条件：(i) 二元奖励（GRPO），(ii) 参考解，以及(iii) 与求解器推理轨迹对齐的分步批评。分步对齐的批评取得了最大增益，分别比GRPO高16.11分，比以参考解为条件的自蒸馏高5.27分（Avg@12）。逐词元优势分析揭示了原因：分步对齐的反馈仅针对推理失败的词元，而保留正确行为。相比之下，以参考解为条件则会迫使模型在每一个词元（包括正确步骤）处改变行为，因为不同的推导在措辞和方式上必然存在差异。这表明反馈与求解器推理之间的结构对齐是自蒸馏效果的关键驱动因素。

相关内容

MoDELS

关注 46

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

综述 | OPSD：大语言模型的在线策略自蒸馏

专知会员服务

10+阅读 · 6月1日

大语言模型同策略蒸馏研究综述

专知会员服务

20+阅读 · 4月5日

【CVPR2024】PromptKD: 无监督提示蒸馏用于视觉-语言模型

专知会员服务

21+阅读 · 2024年3月8日

ChatGPT中的RLHF技术如何用？CMU最新《自然语言生成中的人工反馈集成》综述，详述人类反馈的格式、目标、用途和建模

专知会员服务

75+阅读 · 2023年5月4日