Post-Semantic-Thinking: A Robust Strategy to Distill Reasoning Capacity from Large Language Models

Chain of thought finetuning aims to endow small student models with reasoning capacity to improve their performance towards a specific task by allowing them to imitate the reasoning procedure of large language models (LLMs) beyond simply predicting the answer to the question. However, the existing methods 1) generate rationale before the answer, making their answer correctness sensitive to the hallucination in the rationale;2) force the student model to repeat the exact LLMs rationale expression word-after-word, which could have the model biased towards learning the expression in rationale but count against the model from understanding the core logic behind it. Therefore, we propose a robust Post-Semantic-Thinking (PST) strategy to generate answers before rationale. Thanks to this answer-first setting, 1) the answering procedure can escape from the adverse effects caused by hallucinations in the rationale; 2) the complex reasoning procedure is tightly bound with the relatively concise answer, making the reasoning for questions easier with the prior information in the answer; 3) the efficiency of the method can also benefit from the setting since users can stop the generation right after answers are outputted when inference is conducted. Furthermore, the PST strategy loose the constraint against the generated rationale to be close to the LLMs gold standard in the hidden semantic space instead of the vocabulary space, thus making the small student model better comprehend the semantic reasoning logic in rationale. Extensive experiments conducted across 12 reasoning tasks demonstrate the effectiveness of PST.

翻译：思维链微调旨在赋予小型学生模型推理能力，使其通过模仿大型语言模型（LLM）的推理过程（而非仅预测问题答案）来提升特定任务的表现。然而，现有方法存在两个问题：1）在答案之前生成推理链，导致答案正确性易受推理链中的幻觉影响；2）强制学生模型逐词重复LLM的推理链表达，可能导致模型偏向学习推理链的表面表达方式，却阻碍其理解背后的核心逻辑。因此，我们提出一种稳健的后语义思考（PST）策略，在推理链之前生成答案。得益于这种答案优先的设置，1）回答过程能规避推理链中幻觉带来的负面影响；2）复杂推理过程与相对简洁的答案紧密关联，使问题推理能借助答案中的先验信息更易进行；3）该方法的效率也能受益于此设置——推理阶段用户可在答案输出后立即停止生成。此外，PST策略放宽了对生成推理链的约束，使其在隐藏语义空间（而非词汇空间）接近LLM的黄金标准，从而帮助学生模型更好地理解推理链中的语义逻辑。在12个推理任务上的广泛实验证明了PST的有效性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/