Data synthesis for training large reasoning models offers a scalable alternative to limited, human-curated datasets, enabling the creation of high-quality data. However, existing approaches face several challenges: (i) indiscriminate generation that ignores the solver's ability and yields low-value problems, or reliance on complex data pipelines to balance problem difficulty; and (ii) a lack of reasoning in problem generation, leading to shallow problem variants. In this paper, we develop a problem generator that reasons explicitly to plan problem directions before synthesis and adapts difficulty to the solver's ability. Specifically, we construct related problem pairs and augment them with intermediate problem-design CoT produced by a reasoning model. These data are used to bootstrap problem-design strategies in the generator. Then, we treat the solver's feedback on synthetic problems as a reward signal, enabling the generator to calibrate difficulty and produce complementary problems near the edge of the solver's competence. Extensive experiments on 10 mathematical and general reasoning benchmarks show that our proposed framework achieves a cumulative average improvement of 3.4%, demonstrating robust generalization across both language and vision-language models.
翻译:为训练大规模推理模型进行数据合成,提供了一种可扩展的替代方案,以克服有限的人工标注数据集的限制,从而能够创建高质量数据。然而,现有方法面临若干挑战:(i) 无差别的生成方式忽略了求解器的能力,产生低价值问题,或依赖复杂的数据流水线来平衡问题难度;(ii) 问题生成过程中缺乏推理,导致产生浅层的问题变体。在本文中,我们开发了一种问题生成器,它在合成前通过显式推理来规划问题方向,并根据求解器的能力自适应调整难度。具体而言,我们构建相关的问题对,并使用推理模型产生的中间问题设计思维链对其进行增强。这些数据用于引导生成器中的问题设计策略。然后,我们将求解器对合成问题的反馈视为奖励信号,使生成器能够校准难度,并在求解器能力边缘附近生成互补性问题。在10个数学和通用推理基准上的大量实验表明,我们提出的框架实现了3.4%的累积平均提升,展示了在纯语言模型和视觉语言模型上稳健的泛化能力。