Large language models (LMs) beyond a certain scale, demonstrate the emergent capability of generating free-text rationales for their predictions via chain-of-thought (CoT) prompting. While CoT can yield dramatically improved performance, such gains are only observed for sufficiently large LMs. Even more concerning, there is little guarantee that the generated rationales are consistent with LM's predictions or faithfully justify the decisions. In this work, we propose a faithful knowledge distillation method to learn a small, self-consistent CoT model from a teacher model that is orders of magnitude larger. To form better supervision, we elicit rationales supporting the gold answers from a large LM (teacher) by contrastive decoding, which encourages the teacher to generate tokens that become more plausible only when the answer is considered. To ensure faithful distillation, we use the teacher-generated rationales to learn a student LM with a counterfactual reasoning objective, which prevents the student from ignoring the rationales to make inconsistent predictions. Experiments show that, while yielding comparable end-task performance, our method can generate CoT rationales that are more faithful than baselines do. Further analysis suggests that such a model respects the rationales more when making decisions; thus, we can improve its performance more by refining its rationales.
翻译:大型语言模型在达到一定规模后,展现出通过思维链提示生成自由文本推理过程以支持预测结果的新兴能力。虽然思维链能显著提升性能,但这种增益仅对足够大的模型可见。更令人担忧的是,生成的推理过程与模型预测结果之间缺乏一致性保障,也无法忠实证明决策依据。本研究提出一种忠实知识蒸馏方法,使小模型能从体量高出数个量级的教师模型中学习自洽的思维链。为构建更优监督信号,我们采用对比解码策略,促使教师模型在生成支持正确答案的推理过程时,仅当答案被考虑时才会产生更合理的词元。为确保蒸馏的忠实性,我们利用教师生成的推理过程,通过反事实推理目标训练学生模型,防止其忽略推理过程导致不一致预测。实验表明,在保持相当下游任务性能的同时,本方法生成的思维链推理过程比基线方法更具忠实性。进一步分析显示,此类模型在决策时更尊重推理过程,因此可通过优化推理过程进一步提升性能。