Large Language Models (LLMs) have shown outstanding performance across wide range of downstream tasks. This competency is attributed to their substantial parameter size and pre-training on extensive corpus. Moreover, LLMs have exhibited enhanced reasoning capabilities in tackling complex reasoning tasks, owing to the utilization of a method named ``Chain-of-Thought (CoT) prompting''. This method is designed to generate intermediate reasoning steps that guide the inference of the final answer. However, it is essential to highlight that these advanced reasoning abilities appear to emerge in models with a minimum of 10 billion parameters, thereby limiting its efficacy in situations where computational resources are constrained. In this paper, we investigate the possibility of transferring the reasoning capabilities of LLMs to smaller models via knowledge distillation. Specifically, we propose Sci-CoT, a two-stage framework that separates the processes of generating rationales and inferring answers. This method enables a more efficient use of rationales during the answer inference stage, leading to improved performance on scientific question-answering tasks. Utilizing Sci-CoT, our 80-million parameter model is able to exceed the performance of BLOOM-176B in the ARC-Easy dataset under the few shot setting.
翻译:摘要:大语言模型(LLMs)在下游任务中展现出卓越性能,这归功于其庞大的参数量及在大规模语料上的预训练。此外,通过一种名为“思维链提示”(Chain-of-Thought, CoT)的方法——该方法通过生成中间推理步骤引导最终答案推导——LLMs在处理复杂推理任务时表现出更强的推理能力。然而,需强调这些高级推理能力仅出现在参数量超过100亿的模型中,从而限制了其在计算资源受限场景下的有效性。本文探索通过知识蒸馏将LLMs的推理能力迁移至小模型的可能性。具体而言,我们提出Sci-CoT,一种将理由生成与答案推理过程分离的两阶段框架。该方法能在答案推理阶段更高效地利用理由,从而提升科学问答任务的性能。借助Sci-CoT,我们参数量为8000万的模型在少样本设置下的ARC-Easy数据集上超越了BLOOM-176B的表现。