Chain of thought finetuning aims to endow small student models with reasoning capacity to improve their performance towards a specific task by allowing them to imitate the reasoning procedure of large language models (LLMs) beyond simply predicting the answer to the question. However, the existing methods 1) generate rationale before the answer, making their answer correctness sensitive to the hallucination in the rationale;2) force the student model to repeat the exact LLMs rationale expression word-after-word, which could have the model biased towards learning the expression in rationale but count against the model from understanding the core logic behind it. Therefore, we propose a robust Post-Semantic-Thinking (PST) strategy to generate answers before rationale. Thanks to this answer-first setting, 1) the answering procedure can escape from the adverse effects caused by hallucinations in the rationale; 2) the complex reasoning procedure is tightly bound with the relatively concise answer, making the reasoning for questions easier with the prior information in the answer; 3) the efficiency of the method can also benefit from the setting since users can stop the generation right after answers are outputted when inference is conducted. Furthermore, the PST strategy loose the constraint against the generated rationale to be close to the LLMs gold standard in the hidden semantic space instead of the vocabulary space, thus making the small student model better comprehend the semantic reasoning logic in rationale. Extensive experiments conducted across 12 reasoning tasks demonstrate the effectiveness of PST.
翻译:思维链微调旨在赋予小型学生模型推理能力,使其通过模仿大型语言模型(LLM)的推理过程(而非仅预测问题答案)来提升特定任务的表现。然而,现有方法存在两个问题:1)在答案之前生成推理链,导致答案正确性易受推理链中的幻觉影响;2)强制学生模型逐词重复LLM的推理链表达,可能导致模型偏向学习推理链的表面表达方式,却阻碍其理解背后的核心逻辑。因此,我们提出一种稳健的后语义思考(PST)策略,在推理链之前生成答案。得益于这种答案优先的设置,1)回答过程能规避推理链中幻觉带来的负面影响;2)复杂推理过程与相对简洁的答案紧密关联,使问题推理能借助答案中的先验信息更易进行;3)该方法的效率也能受益于此设置——推理阶段用户可在答案输出后立即停止生成。此外,PST策略放宽了对生成推理链的约束,使其在隐藏语义空间(而非词汇空间)接近LLM的黄金标准,从而帮助学生模型更好地理解推理链中的语义逻辑。在12个推理任务上的广泛实验证明了PST的有效性。