Post-Semantic-Thinking: A Robust Strategy to Distill Reasoning Capacity from Large Language Models

Chain of thought finetuning aims to endow small student models with reasoning capacity to improve their performance towards a specific task by allowing them to imitate the reasoning procedure of large language models (LLMs) beyond simply predicting the answer to the question. However, the existing methods 1) generate rationale before the answer, making their answer correctness sensitive to the hallucination in the rationale;2) force the student model to repeat the exact LLMs rationale expression word-after-word, which could have the model biased towards learning the expression in rationale but count against the model from understanding the core logic behind it. Therefore, we propose a robust Post-Semantic-Thinking (PST) strategy to generate answers before rationale. Thanks to this answer-first setting, 1) the answering procedure can escape from the adverse effects caused by hallucinations in the rationale; 2) the complex reasoning procedure is tightly bound with the relatively concise answer, making the reasoning for questions easier with the prior information in the answer; 3) the efficiency of the method can also benefit from the setting since users can stop the generation right after answers are outputted when inference is conducted. Furthermore, the PST strategy loose the constraint against the generated rationale to be close to the LLMs gold standard in the hidden semantic space instead of the vocabulary space, thus making the small student model better comprehend the semantic reasoning logic in rationale. Extensive experiments conducted across 12 reasoning tasks demonstrate the effectiveness of PST.

翻译：思维链微调旨在通过让小型学生模型模仿大语言模型的推理过程，而不仅仅是预测问题答案，从而赋予其推理能力以提升特定任务性能。然而，现有方法存在两点不足：1）在答案生成前先产生推理过程，使得答案正确性易受推理过程幻觉的影响；2）强制学生模型逐词重复大语言模型的精确推理表述，这可能导致模型偏向于学习推理表述形式，而非理解其背后的核心逻辑。为此，我们提出了一种稳健的后语义推理策略，在推理过程之前先生成答案。得益于这种答案优先的设置：1）答案生成过程能够规避推理过程幻觉带来的负面影响；2）复杂推理过程与相对简洁的答案紧密关联，使问题推理能够借助答案中的先验信息更易进行；3）该方法的效率也受益于此设置，因为推理时用户可在答案输出后立即停止生成。此外，后语义推理策略放松了对生成推理过程的约束，使其在隐语义空间而非词汇空间中接近大语言模型的金标准，从而使小型学生模型更好地理解推理过程中的语义推理逻辑。在12个推理任务上的大量实验证明了后语义推理策略的有效性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/