推理课程：从数学领域引导大型语言模型的广泛推理能力 (Reasoning Curriculum: Bootstrapping Broad LLM Reasoning from Math)

Reinforcement learning (RL) can elicit strong reasoning in large language models (LLMs), yet most open efforts focus on math and code. We propose Reasoning Curriculum, a simple two-stage curriculum that first elicits reasoning skills in pretraining-aligned domains such as math, then adapts and refines these skills across other domains via joint RL. Stage 1 performs a brief cold start and then math-only RL with verifiable rewards to develop reasoning skills. Stage 2 runs joint RL on mixed-domain data to transfer and consolidate these skills. The curriculum is minimal and backbone-agnostic, requiring no specialized reward models beyond standard verifiability checks. Evaluated on Qwen3-4B and Llama-3.1-8B over a multi-domain suite, reasoning curriculum yields consistent gains. Ablations and a cognitive-skill analysis indicate that both stages are necessary and that math-first elicitation increases cognitive behaviors important for solving complex problems. Reasoning Curriculum provides a compact, easy-to-adopt recipe for general reasoning.

翻译：强化学习（RL）能够激发大型语言模型（LLM）强大的推理能力，然而目前多数开源研究集中于数学和代码领域。我们提出推理课程，这是一种简单的两阶段课程方法：首先在预训练对齐的领域（如数学）中激发推理技能，随后通过联合强化学习将这些技能适应并精炼至其他领域。第一阶段进行简短的冷启动，随后在仅数学领域进行具有可验证奖励的强化学习以发展推理技能。第二阶段在混合领域数据上运行联合强化学习，以迁移并巩固这些技能。该课程设计简洁且与模型架构无关，除标准可验证性检查外无需专用奖励模型。在Qwen3-4B和Llama-3.1-8B模型上通过多领域测试套件评估表明，推理课程能带来一致的性能提升。消融实验与认知技能分析表明两个阶段均不可或缺，且数学优先的激发策略能增强解决复杂问题所需的关键认知行为。推理课程为通用推理能力提供了一套紧凑且易于采用的实现方案。