Reinforcement Learning with Verifiable Rewards (RLVR) has recently emerged as the cornerstone for shaping the remarkable coding abilities of Large Language Models (LLMs). However, the scalability of RLVR is severely constrained by the scarcity of sufficiently challenging verifiable code tasks that target near the model's edge of competence. Prior studies often rely on heuristic seed expansions for data synthesis, which severely limits both novelty and difficulty. Consequently, the training value of such data fails to scale proportionally with the size of its synthesis. To this end, we propose Atomic Decomposition and Recombination (ADR), a novel framework that generates verifiable code tasks via decomposition into atomic elements and controlled recombination, thereby enabling the generation of genuinely novel and challenging verifiable code tasks. Experiments and analysis demonstrate that ADR achieves superior originality, difficulty, diversity, and test quality over existing baselines, and consistently delivers greater improvements in code ability across RLVR in diverse downstream domains, including algorithmic programming, tool usage, and data science. Our work sheds light on a new paradigm for novel code task synthesis and scalable RLVR training.
翻译:可验证奖励强化学习(RLVR)近期已成为塑造大型语言模型(LLMs)卓越编码能力的关键基石。然而,RLVR的可扩展性受到严重制约——针对模型能力边界附近的高挑战性可验证代码任务极为稀缺。以往研究常依赖启发式种子扩展进行数据合成,这严重限制了新颖性与难度,导致此类数据的训练价值无法随合成规模扩大而同比提升。为此,我们提出原子分解与重组(ADR)框架,通过将可验证代码任务分解为原子要素并实施受控重组来生成任务,从而能够产生真正新颖且具有挑战性的可验证代码任务。实验与分析表明,ADR在原创性、难度、多样性与测试质量上均超越现有基线,并在算法编程、工具使用、数据科学等多样化下游领域的RLVR中持续带来更显著的编码能力提升。本研究为新型代码任务合成与可扩展RLVR训练开辟了新范式。