Complex reasoning problems often involve implicit spatial and geometric relationships that are not explicitly encoded in text. While recent reasoning models perform well across many domains, purely text-based reasoning struggles to capture structural constraints in complex settings. In this paper, we introduce FIGR, which integrates executable visual construction into multi-turn reasoning via end-to-end reinforcement learning. Rather than relying solely on textual chains of thought, FIGR externalizes intermediate hypotheses by generating executable code that constructs diagrams within the reasoning loop. An adaptive reward mechanism selectively regulates when visual construction is invoked, enabling more consistent reasoning over latent global properties that are difficult to infer from text alone. Experiments on eight challenging mathematical benchmarks demonstrate that FIGR outperforms strong text-only chain-of-thought baselines, improving the base model by 13.12% on AIME 2025 and 11.00% on BeyondAIME. These results highlight the effectiveness of precise, controllable figure construction of FIGR in enhancing complex reasoning ability.
翻译:复杂推理问题通常涉及隐含的空间与几何关系,这些关系无法通过文本显式编码。尽管近期推理模型在多个领域表现优异,但纯文本推理在复杂场景中难以捕捉结构约束。本文提出FIGR模型,通过端到端强化学习将可执行视觉构建整合至多轮推理流程中。区别于单纯依赖文本思维链,FIGR通过生成可在推理循环中构建示意图的可执行代码,将中间假设外部化。自适应奖励机制选择性调控视觉构建的触发时机,使得模型能够对难以仅从文本推断的潜在全局属性进行更一致的推理。在八个具有挑战性的数学基准测试上的实验表明,FIGR显著优于强文本思维链基线模型,在AIME 2025和BeyondAIME基准上分别将基础模型性能提升13.12%和11.00%。这些结果凸显了FIGR精确可控的图形构建机制在增强复杂推理能力方面的有效性。