Reinforcement learning (RL) can elicit strong reasoning in large language models (LLMs), yet most open efforts focus on math and code. We propose Reasoning Curriculum, a simple two-stage curriculum that first elicits reasoning skills in pretraining-aligned domains such as math, then adapts and refines these skills across other domains via joint RL. Stage 1 performs a brief cold start and then math-only RL with verifiable rewards to develop reasoning skills. Stage 2 runs joint RL on mixed-domain data to transfer and consolidate these skills. The curriculum is minimal and backbone-agnostic, requiring no specialized reward models beyond standard verifiability checks. Evaluated on Qwen3-4B and Llama-3.1-8B over a multi-domain suite, reasoning curriculum yields consistent gains. Ablations and a cognitive-skill analysis indicate that both stages are necessary and that math-first elicitation increases cognitive behaviors important for solving complex problems. Reasoning Curriculum provides a compact, easy-to-adopt recipe for general reasoning.
翻译:强化学习(RL)能够激发大型语言模型(LLM)强大的推理能力,然而目前多数开源研究集中于数学和代码领域。我们提出推理课程,这是一种简单的两阶段课程方法:首先在预训练对齐的领域(如数学)中激发推理技能,随后通过联合强化学习将这些技能适应并精炼至其他领域。第一阶段进行简短的冷启动,随后在仅数学领域进行具有可验证奖励的强化学习以发展推理技能。第二阶段在混合领域数据上运行联合强化学习,以迁移并巩固这些技能。该课程设计简洁且与模型架构无关,除标准可验证性检查外无需专用奖励模型。在Qwen3-4B和Llama-3.1-8B模型上通过多领域测试套件评估表明,推理课程能带来一致的性能提升。消融实验与认知技能分析表明两个阶段均不可或缺,且数学优先的激发策略能增强解决复杂问题所需的关键认知行为。推理课程为通用推理能力提供了一套紧凑且易于采用的实现方案。