Chain of thought finetuning aims to endow small student models with reasoning capacity to improve their performance towards a specific task by allowing them to imitate the reasoning procedure of large language models (LLMs) beyond simply predicting the answer to the question. However, the existing methods 1) generate rationale before the answer, making their answer correctness sensitive to the hallucination in the rationale;2) force the student model to repeat the exact LLMs rationale expression word-after-word, which could have the model biased towards learning the expression in rationale but count against the model from understanding the core logic behind it. Therefore, we propose a robust Post-Semantic-Thinking (PST) strategy to generate answers before rationale. Thanks to this answer-first setting, 1) the answering procedure can escape from the adverse effects caused by hallucinations in the rationale; 2) the complex reasoning procedure is tightly bound with the relatively concise answer, making the reasoning for questions easier with the prior information in the answer; 3) the efficiency of the method can also benefit from the setting since users can stop the generation right after answers are outputted when inference is conducted. Furthermore, the PST strategy loose the constraint against the generated rationale to be close to the LLMs gold standard in the hidden semantic space instead of the vocabulary space, thus making the small student model better comprehend the semantic reasoning logic in rationale. Extensive experiments conducted across 12 reasoning tasks demonstrate the effectiveness of PST.
翻译:思维链微调旨在通过让小型学生模型模仿大语言模型的推理过程,而不仅仅是预测问题答案,从而赋予其推理能力以提升特定任务性能。然而,现有方法存在两点不足:1)在答案生成前先产生推理过程,使得答案正确性易受推理过程幻觉的影响;2)强制学生模型逐词重复大语言模型的精确推理表述,这可能导致模型偏向于学习推理表述形式,而非理解其背后的核心逻辑。为此,我们提出了一种稳健的后语义推理策略,在推理过程之前先生成答案。得益于这种答案优先的设置:1)答案生成过程能够规避推理过程幻觉带来的负面影响;2)复杂推理过程与相对简洁的答案紧密关联,使问题推理能够借助答案中的先验信息更易进行;3)该方法的效率也受益于此设置,因为推理时用户可在答案输出后立即停止生成。此外,后语义推理策略放松了对生成推理过程的约束,使其在隐语义空间而非词汇空间中接近大语言模型的金标准,从而使小型学生模型更好地理解推理过程中的语义推理逻辑。在12个推理任务上的大量实验证明了后语义推理策略的有效性。