Online Reinforcement Learning (RL) offers a promising avenue for complex image editing but is currently constrained by the scarcity of reliable and fine-grained reward signals. Existing evaluators frequently struggle with a critical perception gap we term "Attention Collapse," where models neglect cross-image comparisons and fail to capture fine-grained details, resulting in inaccurate perception and miscalibrated scores. To address these limitations, we propose SpatialReward, a reward model that enforces precise verification via explicit spatial reasoning. By anchoring reasoning to predicted edit regions, SpatialReward grounds semantic judgments in pixel-level evidence, significantly enhancing evaluative accuracy. Trained on a curated 260k spatial-aware dataset, our model achieves state-of-the-art performance on MMRB2 and EditReward-Bench, and outperforms proprietary evaluators on our proposed MultiEditReward-Bench. Furthermore, SpatialReward serves as a robust signal in online RL, boosting OmniGen2 by +0.90 on GEdit-Bench--surpassing the leading discriminative model and doubling the gain of GPT-4.1 (+0.45). These results demonstrate that spatial reasoning is essential for unlocking effective alignment in image editing.
翻译:在线强化学习为复杂图像编辑提供了一条前景广阔的途径,但目前受限于可靠且细粒度奖励信号的稀缺。现有评估器经常面临一个我们称之为“注意力坍缩”的关键感知鸿沟:模型忽视跨图像比较,无法捕捉细粒度细节,导致感知不准确和分数校准失当。为应对这些局限,我们提出SpatialReward,一种通过显式空间推理实施精确验证的奖励模型。通过将推理锚定在预测的编辑区域,SpatialReward将语义判断建立在像素级证据之上,显著提升了评估准确性。在精心构建的26万空间感知数据集上训练后,我们的模型在MMRB2和EditReward-Bench上取得了最先进的性能,并在我们提出的MultiEditReward-Bench上超越了专有评估器。此外,SpatialReward可作为在线强化学习中的稳健信号,将OmniGen2在GEdit-Bench上的得分提升+0.90——超越了领先的判别式模型,并使GPT-4.1的增益翻倍(+0.45)。这些结果表明,空间推理对于实现图像编辑中的有效对齐至关重要。