Beyond Imitation: Learning Key Reasoning Steps from Dual Chain-of-Thoughts in Reasoning Distillation

As Large Language Models (LLMs) scale up and gain powerful Chain-of-Thoughts (CoTs) reasoning abilities, practical resource constraints drive efforts to distill these capabilities into more compact Smaller Language Models (SLMs). We find that CoTs consist mainly of simple reasoning forms, with a small proportion ($\approx 4.7\%$) of key reasoning steps that truly impact conclusions. However, previous distillation methods typically involve supervised fine-tuning student SLMs only on correct CoTs data produced by teacher LLMs, resulting in students struggling to learn the key reasoning steps, instead imitating the teacher's reasoning forms and making errors or omissions on these steps. To address these issues, drawing an analogy to human learning, where analyzing mistakes according to correct solutions often reveals the crucial steps leading to successes or failures, we propose mistak\textbf{E}-\textbf{D}riven key reason\textbf{I}ng step distilla\textbf{T}ion (\textbf{EDIT}), a novel method that further aids SLMs learning key reasoning steps rather than mere simple fine-tuning. Firstly, to expose these crucial steps in CoTs, we design specific prompts to generate dual CoTs data with similar reasoning paths but divergent conclusions. Then, we apply the minimum edit distance algorithm on the dual CoTs data to locate these key steps and optimize the likelihood of these steps. Extensive experiments validate the effectiveness of EDIT across both in-domain and out-of-domain benchmark reasoning datasets. Further analysis shows that EDIT can generate high-quality CoTs with more correct key reasoning steps. Notably, we also explore how different mistake patterns affect performance and find that EDIT benefits more from logical errors than from knowledge or mathematical calculation errors in dual CoTs\footnote{Code can be found at \url{https://github.com/C-W-D/EDIT}}.

翻译：随着大型语言模型（LLM）规模的扩大并获得强大的思维链（CoT）推理能力，实际资源限制推动了将这些能力蒸馏到更紧凑的小型语言模型（SLM）中的努力。我们发现，思维链主要由简单的推理形式构成，其中只有一小部分（约4.7%）是真正影响结论的关键推理步骤。然而，以往的蒸馏方法通常仅涉及在教师LLM产生的正确CoT数据上对学生SLM进行监督微调，导致学生难以学习关键推理步骤，反而模仿教师的推理形式，并在这些步骤上出现错误或遗漏。为了解决这些问题，借鉴人类学习的类比——通过对照正确解决方案分析错误往往能揭示导致成功或失败的关键步骤——我们提出了一种新颖的方法：基于错误驱动的关键推理步骤蒸馏（EDIT）。该方法旨在进一步帮助SLM学习关键推理步骤，而非仅仅进行简单的微调。首先，为了揭示CoT中的这些关键步骤，我们设计了特定的提示来生成具有相似推理路径但结论相异的双重CoT数据。然后，我们在双重CoT数据上应用最小编辑距离算法来定位这些关键步骤，并优化这些步骤的似然概率。大量实验验证了EDIT在领域内和领域外基准推理数据集上的有效性。进一步的分析表明，EDIT能够生成具有更多正确关键推理步骤的高质量CoT。值得注意的是，我们还探讨了不同的错误模式如何影响性能，并发现EDIT从双重CoT中的逻辑错误中获益更多，而非知识或数学计算错误（代码可在 https://github.com/C-W-D/EDIT 找到）。