Image spatial editing performs geometry-driven transformations, allowing precise control over object layout and camera viewpoints. Current models are insufficient for fine-grained spatial manipulations, motivating a dedicated assessment suite. Our contributions are listed: (i) We introduce SpatialEdit-Bench, a complete benchmark that evaluates spatial editing by jointly measuring perceptual plausibility and geometric fidelity via viewpoint reconstruction and framing analysis. (ii) To address the data bottleneck for scalable training, we construct SpatialEdit-500k, a synthetic dataset generated with a controllable Blender pipeline that renders objects across diverse backgrounds and systematic camera trajectories, providing precise ground-truth transformations for both object- and camera-centric operations. (iii) Building on this data, we develop SpatialEdit-16B, a baseline model for fine-grained spatial editing. Our method achieves competitive performance on general editing while substantially outperforming prior methods on spatial manipulation tasks. All resources will be made public at https://github.com/EasonXiao-888/SpatialEdit.
翻译:图像空间编辑通过几何驱动的变换,实现对物体布局与相机视角的精确控制。当前模型在细粒度空间操作方面仍存在不足,亟需专门的评估体系。本文贡献如下:(i)提出SpatialEdit-Bench,通过视角重建与构图分析联合评估感知合理性与几何保真度,构建完整的空间编辑评测基准;(ii)为解决可扩展训练的数据瓶颈,构建SpatialEdit-500k合成数据集,基于可控Blender管线生成多背景物体及系统性相机轨迹渲染结果,为面向物体与相机的操作提供精确真值变换;(iii)基于该数据,开发细粒度空间编辑基线模型SpatialEdit-16B,在通用编辑任务上取得竞争力表现,并在空间操作任务中显著优于先前方法。所有资源将在https://github.com/EasonXiao-888/SpatialEdit公开。