GeM-NR: Geometry-Aware Multi-View Editing for Nonrigid Scene Changes

Recent developments in multi-view image editing with generative models have brought us a step closer toward general 3D content generation and customization. Most existing works focus on rigid or appearance-only edits by utilizing the geometry of the unedited scene. This naturally limits these methods to edits that preserve the underlying scene structure. Other approaches are trained for specific image editing tasks, such as object removal and addition. Despite this progress, general nonrigid edits, i.e., edits that substantially change the scene geometry, remain challenging for existing methods. We propose GeM-NR, a fast and flexible training-free approach for general multi-view consistent image editing, including edits that drastically change the geometry and appearance of the scene. Given an anchor image edited with a chosen backbone editor (such as FLUX, Qwen, BrushNet) and a query unedited image, GeM-NR edits the query image consistently with the anchor edit. The method incorporates multiple stages: (i) depth map estimation, where we propose a strategy to maximize the alignment between the 3D point clouds of the edited and unedited scenes, (ii) projection onto a query viewpoint, and (iii) refinement of the obtained image conditioned on the unedited query. The conditioning-based formulation scales well from two to many views of an object. We demonstrate the ability of our method to handle edits with significant changes in geometry and appearance, something that existing methods struggle with. We perform an extensive evaluation showing that our method improves consistency for a wide variety of edit tasks, including generating 3D representations of the edited scene. Both quantitative and qualitative results indicate the state-of-the-art performance of our method in terms of edit quality as well as geometric and photometric consistency across multiple views.

翻译：近年来，基于生成模型的多视图图像编辑技术推动我们向通用三维内容生成与定制迈进。现有工作多利用未编辑场景的几何结构，仅支持刚性或外观层面的编辑，这自然限制了这些方法只能保留原有场景结构的编辑操作。其他方法则针对特定图像编辑任务（如物体移除与添加）进行训练。尽管取得进展，但通用非刚性编辑（即显著改变场景几何的编辑）对现有方法仍构成挑战。我们提出GeM-NR——一种快速灵活的无训练方法，用于通用多视图一致图像编辑，包括剧烈改变场景几何与外观的编辑。给定一张已通过选定基础编辑器（如FLUX、Qwen、BrushNet）编辑的锚点图像和一张未编辑的查询图像，GeM-NR可保持与锚点编辑一致地编辑查询图像。该方法包含多阶段流程：（i）深度图估计——我们提出策略最大化已编辑与未编辑场景三维点云之间的对齐度；（ii）投影至查询视角；以及（iii）基于未编辑查询对所得图像进行精炼。这种基于条件化的框架可从双视图高效扩展至多视图对象。我们展示了方法处理几何与外观显著变化编辑的能力，这是现有方法难以实现的。大量评估表明，本方法在涵盖生成编辑场景三维表征在内的多种编辑任务中均能提升一致性。定量与定性结果均显示，本方法在编辑质量、多视图几何与光度一致性方面达到最先进水平。