TACLer：面向高效推理的定制化课程强化学习 (TACLer: Tailored Curriculum Reinforcement Learning for Efficient Reasoning)

Large Language Models (LLMs) have shown remarkable performance on complex reasoning tasks, especially when equipped with long chain-of-thought (CoT) reasoning. However, eliciting long CoT typically requires large-scale reinforcement learning (RL) training, while often leading to overthinking with redundant intermediate steps. To improve learning and reasoning efficiency, while preserving or even enhancing performance, we propose TACLer, a model-tailored curriculum reinforcement learning framework that gradually increases the complexity of the data based on the model's proficiency in multi-stage RL training. TACLer features two core components: (i) tailored curriculum learning that determines what knowledge the model lacks and needs to learn in progressive stages; (ii) a hybrid Thinking/NoThinking reasoning paradigm that balances accuracy and efficiency by enabling or disabling the Thinking mode. Our experiments show that TACLer yields a twofold advantage in learning and reasoning: (i) it reduces computational cost, cutting training compute by over 50% compared to long thinking models and reducing inference token usage by over 42% relative to the base model; and (ii) it improves accuracy by over 9% on the base model, consistently outperforming state-of-the-art Nothinking and Thinking baselines across four math datasets with complex problems.

翻译：大型语言模型（LLM）在复杂推理任务上展现出卓越性能，尤其在配备长链思维（CoT）推理机制时表现突出。然而，激发长链思维通常需要大规模强化学习（RL）训练，且常因冗余中间步骤导致过度思考。为在保持甚至提升性能的同时提高学习与推理效率，我们提出TACLer——一种基于模型能力定制课程的多阶段强化学习框架，能根据模型熟练度逐步提升数据复杂度。TACLer包含两个核心组件：（一）定制化课程学习机制，动态判定模型在渐进式训练阶段中缺乏且需掌握的知识；（二）混合型“思考/无思考”推理范式，通过启用或禁用思考模式平衡准确性与效率。实验表明TACLer在学习和推理方面具有双重优势：（一）显著降低计算成本，相比长思维模型减少超过50%的训练算力，相较于基础模型减少超过42%的推理令牌消耗；（二）在基础模型上实现超过9%的准确率提升，在四个包含复杂问题的数学数据集上持续超越当前最先进的“无思考”与“思考”基线模型。

相关内容

课程

关注 6

课程是指学校学生所应学习的学科总和及其进程与安排。课程是对教育的目标、教学内容、教学活动方式的规划和设计，是教学计划、教学大纲等诸多方面实施过程的总和。广义的课程是指学校为实现培养目标而选择的教育内容及其进程的总和，它包括学校老师所教授的各门学科和有目的、有计划的教育活动。狭义的课程是指某一门学科。专知上对国内外最新AI+X的课程进行了收集与索引，涵盖斯坦福大学、CMU、MIT、清华、北大等名校开放课程。

【AAAI2026】善始则事半功倍：基于前缀优化的大语言模型推理强化学习

专知会员服务

13+阅读 · 2025年12月19日

面向大型推理模型的强化学习综述

专知会员服务

29+阅读 · 2025年9月11日

《潜在推理综述》

专知会员服务

21+阅读 · 2025年7月9日

超越语言的推理：潜在思维链推理的综合综述

专知会员服务

22+阅读 · 2025年5月23日