Multimodal Large Language Models (MLLMs) struggle with complex geometric reasoning, largely because "black box" outcome-based supervision fails to distinguish between lucky guesses and rigorous deduction. To address this, we introduce a paradigm shift towards subgoal-level evaluation and learning. We first construct GeoGoal, a benchmark synthesized via a rigorous formal verification data engine, which converts abstract proofs into verifiable numeric subgoals. This structure reveals a critical divergence between reasoning quality and outcome accuracy. Leveraging this, we propose the Sub-Goal Verifiable Reward (SGVR) framework, which replaces sparse signals with dense rewards based on the Skeleton Rate. Experiments demonstrate that SGVR not only enhances geometric performance (+9.7%) but also exhibits strong generalization, transferring gains to general math (+8.0%) and other general reasoning tasks (+2.8%), demonstrating broad applicability across diverse domains.
翻译:多模态大语言模型(MLLMs)在复杂几何推理任务上表现不佳,其主要原因在于基于结果的“黑箱”监督机制无法区分侥幸猜测与严谨演绎。为解决此问题,我们引入了一种转向子目标层面评估与学习的范式。我们首先构建了GeoGoal基准数据集,该数据集通过一个严谨的形式化验证数据引擎合成,能够将抽象证明过程转化为可验证的数值子目标。这一结构揭示了推理质量与结果准确性之间的关键差异。基于此,我们提出了子目标可验证奖励(SGVR)框架,该框架以基于骨架率的密集奖励替代了稀疏的信号反馈。实验表明,SGVR不仅显著提升了几何推理性能(+9.7%),还展现出强大的泛化能力,能够将性能增益迁移至通用数学任务(+8.0%)及其他通用推理任务(+2.8%),证明了其在多样化领域中的广泛适用性。