Training language models to solve complex mathematical problems benefits from curriculum learning progressively training on simpler subproblems. However, existing decomposition methods are often heuristic, offering no guarantees that subproblems are simpler, that solving them aids the parent task, or that their relationships are mathematically grounded. We observe that symbolic differentiation provides a natural structure for verified decomposition: calculus rules explicitly define how expressions reduce to simpler components with provable properties. We introduce Verify-RL, a framework where every parent-child decomposition satisfies three verifiable conditions: strictly decreasing structural complexity, solution containment, and formal rule derivation. Unlike heuristic methods where a significant fraction of decompositions are invalid our properties admit automatic verification through symbolic computation, achieving "verification by construction" Experiments demonstrate that eliminating invalid decompositions yields sizable gains, accuracy on the hardest problems more than doubles from 32% to 68%, with a 40% relative improvement overall.
翻译:训练语言模型解决复杂数学问题受益于课程学习,即逐步在更简单的子问题上进行训练。然而,现有的分解方法通常是启发式的,无法保证子问题更简单、解决子问题有助于完成父任务,或者子问题之间的关系具有数学依据。我们观察到,符号微分为可验证的分解提供了一种自然的结构:微积分规则明确定义了表达式如何约简为具有可证明性质的更简单组件。我们提出了Verify-RL框架,其中每个父子分解都满足三个可验证的条件:严格递减的结构复杂性、解包含性以及形式规则推导。与启发式方法中常有相当一部分分解无效不同,我们的性质允许通过符号计算进行自动验证,实现了“构造即验证”。实验表明,消除无效分解能带来显著收益,最困难问题上的准确率从32%提升至68%以上,翻了一倍多,整体相对改进达到40%。