Large language models (LLMs) have achieved remarkable advancements in natural language processing. However, the massive scale and computational demands of these models present formidable challenges when considering their practical deployment in resource-constrained environments. While techniques such as chain-of-thought (CoT) distillation have displayed promise in distilling LLMs into small language models (SLMs), there is a risk that distilled SLMs may still inherit flawed reasoning and hallucinations from LLMs. To address these issues, we propose a twofold methodology: First, we introduce a novel method for distilling the self-evaluation capability from LLMs into SLMs, aiming to mitigate the adverse effects of flawed reasoning and hallucinations inherited from LLMs. Second, we advocate for distilling more comprehensive thinking by incorporating multiple distinct CoTs and self-evaluation outputs, to ensure a more thorough and robust knowledge transfer into SLMs. Experiments on three NLP benchmarks demonstrate that our method significantly improves the performance of distilled SLMs, offering a new perspective for developing more effective and efficient SLMs in resource-constrained environments.
翻译:大语言模型(LLMs)在自然语言处理领域取得了显著进展。然而,这些模型的庞大规模和计算需求在实际部署于资源受限环境时带来了严峻挑战。虽然链式思维(CoT)蒸馏等技术在将LLMs蒸馏为小语言模型(SLMs)方面显示出潜力,但蒸馏后的SLMs仍可能继承LLMs中存在的错误推理和幻觉问题。针对这些问题,我们提出了一种双重方法:首先,引入了一种从LLMs向SLMs提炼自我评估能力的新方法,旨在缓解继承自LLMs的错误推理和幻觉带来的负面影响。其次,我们主张通过整合多个不同的CoT输出与自我评估结果来提炼更全面的思考,从而确保更彻底且稳健地向SLMs进行知识迁移。在三个自然语言处理基准上的实验表明,我们的方法显著提升了蒸馏后SLMs的性能,为在资源受限环境中开发更高效、更实用的SLMs提供了新视角。