Supervised fine-tuning with expert demonstrations often produces models that imitate outputs without internalizing the reasoning processes needed for robust generalization. While critique-based approaches show promise, training models to generate critiques directly, such as Critique Fine-Tuning (CFT), can lead to output-format drift and degradation of general capabilities. We propose Critique-Guided Distillation (CGD), a training framework that decouples critique consumption from critique generation. During fine-tuning, the student is trained to refine flawed responses conditioned on teacher critiques. CGD treats critiques as a \textit{training-time-only} supervision signal, encouraging internalization of error-aware reasoning: critiques guide learning but are absent at inference. Controlled ablations confirm that these reasoning gains are directly driven by the specificity and relevance of the teacher's feedback. Across five model families, CGD consistently outperforms CFT and standard distillation on mathematical reasoning benchmarks, yielding 7\% average improvements and gains of up to +15.0\% on AMC23 and +12.2\% on MATH-500. On challenging competition problems such as AIME24 and AIME25, CGD achieves substantially higher Pass@1 and stronger performance at low Pass@k, indicating improved reasoning quality per sample. Importantly, CGD preserves general instruction-following capabilities where CFT degrades significantly ($-$21.3\% on IFEval). These results position CGD as a practical and compute-efficient intermediate training paradigm for reasoning-centric tasks without introducing architectural inference-time overhead.
翻译:基于专家示范进行监督微调,常使模型模仿输出而未能内化实现稳健泛化所需的推理过程。尽管基于批判的方法展现出潜力,但直接训练模型生成批判(如批判微调CFT)可能导致输出格式漂移及通用能力退化。我们提出批判引导蒸馏(CGD),该训练框架将批判消费与批判生成解耦。在微调阶段,学生模型被训练为根据教师批判精炼有缺陷的响应。CGD将批判视为仅在线训练阶段存在的监督信号,鼓励模型内化错误感知推理:批判引导学习过程,但在推理阶段不参与。控制消融实验证实,这些推理增益直接源于教师反馈的特异性与相关性。在五个模型族中,CGD在数学推理基准测试上持续优于CFT和标准蒸馏,平均提升7%,在AMC23上提升高达15.0%,在MATH-500上提升12.2%。面对具有挑战性的竞赛题目(如AIME24和AIME25),CGD实现了更高的Pass@1及在低Pass@k下更强的性能,表明单样本推理质量提升。更重要的是,CGD保留了通用指令遵循能力,而CFT在该能力上显著下降(IFEval下降21.3%)。这些结果表明CGD是一种实用且计算高效的贯穿式训练范式,适用于以推理为中心的任务,且无需引入架构级推理开销。