Large language models (LMs) beyond a certain scale, demonstrate the emergent capability of generating free-text rationales for their predictions via chain-of-thought (CoT) prompting. While CoT can yield dramatically improved performance, such gains are only observed for sufficiently large LMs. Even more concerning, there is little guarantee that the generated rationales are consistent with LM's predictions or faithfully justify the decisions. In this work, we propose a faithful knowledge distillation method to learn a small, self-consistent CoT model from a teacher model that is orders of magnitude larger. To form better supervision, we elicit rationales supporting the gold answers from a large LM (teacher) by contrastive decoding, which encourages the teacher to generate tokens that become more plausible only when the answer is considered. To ensure faithful distillation, we use the teacher-generated rationales to learn a student LM with a counterfactual reasoning objective, which prevents the student from ignoring the rationales to make inconsistent predictions. Experiments show that, while yielding comparable end-task performance, our method can generate CoT rationales that are more faithful than baselines do. Further analysis suggests that such a model respects the rationales more when making decisions; thus, we can improve its performance more by refining its rationales.
翻译:大型语言模型在超过一定规模后,展现出通过思维链提示生成预测自由文本推理过程的涌现能力。尽管思维链能显著提升性能,但这种增益仅出现在足够大的语言模型中。更令人担忧的是,生成的推理过程与模型预测的一致性及其对决策的忠实性缺乏保证。本文提出一种忠实知识蒸馏方法,从规模大数个数量级的教师模型学习小型自洽思维链模型。为构建更优监督信号,我们通过对比解码从大型语言模型(教师)中激发出支持正确答案的推理过程——该方法促使教师仅在考虑答案时生成更合理的词元。为确保蒸馏的忠实性,我们采用反事实推理目标训练学生模型,该目标旨在防止学生忽略推理过程作出不一致预测。实验表明,在保持相当任务性能的同时,本方法生成的思维链推理过程比基线方法更具忠实性。进一步分析揭示,该模型在决策时更尊重推理过程,因此可通过优化推理过程进一步提升其性能。