Distilling Reasoning Ability from Large Language Models with Adaptive Thinking

Chain of thought finetuning (cot-finetuning) aims to endow small language models (SLM) with reasoning ability to improve their performance towards specific tasks by allowing them to imitate the reasoning procedure of large language models (LLM) beyond simply predicting the answers. Most existing cot-finetuning methods adopt a pre-thinking mechanism, allowing the SLM to generate a rationale before providing an answer. This mechanism enables SLM to analyze and think about complex questions, but it also makes answer correctness highly sensitive to minor errors in rationale. Therefore, we propose a robust post-thinking mechanism to generate answers before rationale. Thanks to this answer-first setting, 1) the answer can escape from the adverse effects caused by minor errors in the rationale; 2) the rationale serves as an error amplifier to the answer, which makes the SLM focus on learning hard samples; 3) the inferring efficiency can also benefit from the setting since users can stop the generation right after answers are outputted when inference is conducted. However, although the post-thinking mechanism brings many advantages and improves the overall performance of SLM on specific tasks, it may lose the ability to think about the questions and decompose complex questions into simple sub-questions compared to pre-thinking mechanism. Therefore, a plug-and-play adaptive-thinking mechanism is proposed with the aid of the soft prompt tuning to integrate the merits of the pre-thinking mechanism and post-thinking mechanism, in which a perception module is introduced to adaptively prompt SLM answer or think first based on perceiving the complexity of the questions. Extensive experiments are conducted across 12 reasoning tasks and 2 representative language models to demonstrate the effectiveness of the proposed mechanism.

翻译：思维链微调旨在赋予小型语言模型推理能力，使其不仅能预测答案，更能模仿大型语言模型的推理过程，从而提升在特定任务上的表现。现有思维链微调方法多采用预思考机制，即让小型语言模型在给出答案前先生成推理依据。该机制使模型能够分析思考复杂问题，但也导致答案正确性对推理依据中的微小误差极为敏感。为此，我们提出一种鲁棒的后思考机制，将答案生成置于推理依据之前。得益于这种答案优先的设置：1）答案可规避因推理依据细微错误带来的负面影响；2）推理依据成为答案的误差放大器，促使小型语言模型专注于学习困难样本；3）推理效率亦能从中获益——实际推断时，用户可在答案输出后立即终止生成过程。然而，尽管后思考机制具备诸多优势并能提升小型语言模型在特定任务上的整体性能，但与预思考机制相比，其可能丧失对问题的思考能力及将复杂问题分解为简单子问题的能力。因此，我们借助软提示调优技术，提出一种即插即用的自适应思维机制，以融合预思考与后思考机制的优势。该机制引入感知模块，通过感知问题复杂度自适应地提示小型语言模型选择优先生成答案或进行思考。我们在12项推理任务和2个代表性语言模型上进行了广泛实验，验证了所提机制的有效性。