Large language models (LLMs) have showcased remarkable capabilities in complex reasoning through chain of thought (CoT) prompting. Recently, there has been a growing interest in transferring these reasoning abilities from LLMs to smaller models. However, achieving both the diversity and consistency in rationales presents a challenge. In this paper, we focus on enhancing these two aspects and propose Multi-CoT Consistent Knowledge Distillation (MCC-KD) to efficiently distill the reasoning capabilities. In MCC-KD, we generate multiple rationales for each question and enforce consistency among the corresponding predictions by minimizing the bidirectional KL-divergence between the answer distributions. We investigate the effectiveness of MCC-KD with different model architectures (LLaMA/FlanT5) and various model scales (3B/7B/11B/13B) on both mathematical reasoning and commonsense reasoning benchmarks. The empirical results not only confirm MCC-KD's superior performance on in-distribution datasets but also highlight its robust generalization ability on out-of-distribution datasets.
翻译:大型语言模型通过思维链提示在复杂推理中展现出卓越能力。近期,将大模型的推理能力迁移至小模型的研究日益受到关注,但实现推理路径的多样性与一致性仍面临挑战。本文聚焦于增强这两方面特性,提出多链思维一致知识蒸馏方法,以高效蒸馏大模型的推理能力。该方法通过为每个问题生成多条推理路径,并最小化答案分布间的双向KL散度来强制预测结果的一致性。我们在数学推理及常识推理基准上,针对不同模型架构与多种模型规模验证了MCC-KD的有效性。实验结果不仅证实了该方法在分布内数据集上的优越性能,还突显了其在分布外数据集上的强大泛化能力。