Large Language Models exhibit impressive reasoning capabilities across diverse tasks, motivating efforts to distill these capabilities into smaller models through generated reasoning data. However, direct training on such synthesized reasoning data may lead to superficial imitation of reasoning process, rather than fostering a genuine integration of reasoning capabilities with underlying knowledge. To address this, we propose TinyThinker, a framework introducing two novel approaches. First, we introduce a three-stage process that incrementally guides the student model through the reasoning process, progressively refining knowledge from coarse to fine granularity. Second, we develop a two-phase training framework comprising an initial reasoning acquisition phase followed by a self-reflection phase utilizing self-generated data. Experiments on commonsense reasoning benchmarks demonstrate that TinyThinker achieves superior performance compared to baselines. Ablation studies further validate the effectiveness of each component in our framework. TinyThinker is extendable to other knowledge-intensive reasoning tasks, offering an alternative strategy for developing effective reasoning capabilities in smaller language models. Codes are available at https://github.com/shengminp/TinyThinker
翻译:大型语言模型在多样化任务中展现出令人印象深刻的推理能力,这推动了通过生成推理数据将这些能力蒸馏至更小模型的研究。然而,直接在此类合成推理数据上进行训练可能导致对推理过程的表面模仿,而非促进推理能力与底层知识的真正融合。为解决此问题,我们提出了TinyThinker框架,引入了两种新颖方法。首先,我们引入一个三阶段过程,逐步引导学生模型经历推理过程,以从粗到细的粒度渐进式精炼知识。其次,我们开发了一个两阶段训练框架,包含初始的推理获取阶段和随后利用自生成数据的自反思阶段。在常识推理基准上的实验表明,TinyThinker相较于基线方法取得了更优的性能。消融研究进一步验证了我们框架中每个组件的有效性。TinyThinker可扩展至其他知识密集型推理任务,为在更小语言模型中发展有效推理能力提供了一种替代策略。代码发布于 https://github.com/shengminp/TinyThinker