Large language models (LLMs) exhibit impressive emergent abilities in natural language processing, but their democratization is hindered due to huge computation requirements and closed-source nature. Recent research on advancing open-source smaller LMs by distilling knowledge from black-box LLMs has obtained promising results in the instruction-following ability. However, the reasoning ability which is more challenging to foster, is relatively rarely explored. In this paper, we propose a tailored learning approach to distill such reasoning ability to smaller LMs to facilitate the democratization of the exclusive reasoning ability. In contrast to merely employing LLM as a data annotator, we exploit the potential of LLM as a reasoning teacher by building an interactive multi-round learning paradigm. This paradigm enables the student to expose its deficiencies to the black-box teacher who then can provide customized training data in return. Further, to exploit the reasoning potential of the smaller LM, we propose self-reflection learning to motivate the student to learn from self-made mistakes. The learning from self-reflection and LLM are all tailored to the student's learning status, thanks to the seamless integration with the multi-round learning paradigm. Comprehensive experiments and analysis on mathematical and commonsense reasoning tasks demonstrate the effectiveness of our method. The code will be available at https://github.com/Raibows/Learn-to-Reason.
翻译:大型语言模型(LLMs)在自然语言处理中展现出惊人的涌现能力,但由于其巨大的计算需求和闭源性质,其民主化进程受到阻碍。近年来,通过从黑盒大语言模型中蒸馏知识来提升开源小型语言模型的研究,在指令遵循能力方面取得了显著成果。然而,更具挑战性的推理能力却鲜有探索。本文提出一种定制化学习方法,将推理能力蒸馏至小型语言模型,以推动专属推理能力的民主化。与仅将大语言模型作为数据标注器不同,我们通过构建交互式多轮学习范式,充分发挥其作为推理教师角色的潜力。该范式使学生模型能够向黑盒教师暴露自身缺陷,教师则据此提供定制化训练数据。进一步地,为挖掘小型语言模型的推理潜力,我们提出自我反思学习机制,激励学生从自身错误中学习。得益于与多轮学习范式的无缝集成,自我反思学习与大语言模型学习均针对学生当前学习状态进行定制化。在数学推理与常识推理任务上的全面实验与分析证明了本方法的有效性。代码将发布于https://github.com/Raibows/Learn-to-Reason。