Large Language Models (LLMs) have performed well on various reasoning tasks, but their inaccessibility and numerous parameters hinder wide application in practice. One promising way is distilling the reasoning ability from LLMs to small models by the generated chain-of-thought reasoning paths. In some cases, however, LLMs may produce incorrect reasoning chains, especially when facing complex mathematical problems. Previous studies only transfer knowledge from positive samples and drop the synthesized data with wrong answers. In this work, we illustrate the merit of negative data and propose a model specialization framework to distill LLMs with negative samples besides positive ones. The framework consists of three progressive steps, covering from training to inference stages, to absorb knowledge from negative data. We conduct extensive experiments across arithmetic reasoning tasks to demonstrate the role of negative data in distillation from LLM.
翻译:大型语言模型(LLMs)在各种推理任务中表现优异,但其不可访问性和庞大的参数量限制了实际中的广泛应用。一种有前景的方法是通过生成思维链推理路径,将推理能力从LLMs蒸馏到小模型中。然而,在某些情况下,特别是在处理复杂数学问题时,LLMs可能会产生错误的推理链。以往的研究仅从正样本中迁移知识,并丢弃了带有错误答案的合成数据。在本工作中,我们阐述了负样本数据的价值,并提出了一种模型专业化框架,除正样本外还利用负样本对LLMs进行蒸馏。该框架包含三个逐步递进的阶段,涵盖从训练到推理阶段,以吸收负样本数据中的知识。我们在算术推理任务上进行了大量实验,展示了负样本在LLM蒸馏中的作用。