Recent advancements highlight the success of instruction tuning with large language models (LLMs) utilizing Chain-of-Thought (CoT) data for mathematical reasoning tasks. Despite the fine-tuned LLMs, challenges persist, such as incorrect, missing, and redundant steps in CoT generation leading to inaccuracies in answer predictions. To alleviate this problem, we propose a dual instruction tuning strategy to meticulously model mathematical reasoning from both forward and reverse directions. This involves introducing the Intermediate Reasoning State Prediction task (forward reasoning) and the Instruction Reconstruction task (reverse reasoning) to enhance the LLMs' understanding and execution of instructions. Training instances for these tasks are constructed based on existing mathematical instruction tuning datasets. Subsequently, LLMs undergo multi-task fine-tuning using both existing mathematical instructions and the newly created data. Comprehensive experiments validate the effectiveness and domain generalization of the dual instruction tuning strategy across various mathematical reasoning tasks.
翻译:近期研究进展凸显了利用思维链数据对大型语言模型进行指令调优在数学推理任务中的成功。然而,即便经过微调,大型语言模型在思维链生成中仍存在步骤错误、缺失或冗余等问题,导致答案预测不准确。为缓解这一难题,我们提出一种双重指令调优策略,从正向与逆向两个方向精细建模数学推理过程。该策略通过引入中间推理状态预测任务(正向推理)与指令重建任务(逆向推理),增强大型语言模型对指令的理解与执行能力。相关训练实例基于现有数学指令调优数据集构建。随后,大型语言模型通过多任务微调,同时利用现有数学指令与新创建数据进行训练。全面实验验证了该双重指令调优策略在多种数学推理任务中的有效性与领域泛化能力。