Recent advancements in diffusion and flow models have greatly improved text-based image editing, yet methods that edit images independently often produce geometrically and photometrically inconsistent results across different views of the same scene. Such inconsistencies are particularly problematic for editing of 3D representations such as NeRFs or Gaussian splat models. We propose a training-free guidance framework that enforces multi-view consistency during the image editing process. The key idea is that corresponding points should look similar after editing. To achieve this, we introduce a consistency loss that guides the denoising process toward coherent edits. The framework is flexible and can be combined with widely varying image editing methods, supporting both dense and sparse multi-view editing setups. Experimental results show that our approach significantly improves 3D consistency compared to existing multi-view editing methods. We also show that this increased consistency enables high-quality Gaussian splat editing with sharp details and strong fidelity to user-specified text prompts. Please refer to our project page for video results: https://3d-consistent-editing.github.io/
翻译:近期扩散模型与流模型的进展极大改善了基于文本的图像编辑技术,但独立编辑各视图图像的方法往往导致同一场景不同视角间产生几何与光度不一致的结果。这种不一致性尤其对NeRF或高斯泼溅模型等三维表示的编辑构成显著挑战。我们提出一种无需训练的引导框架,能够在图像编辑过程中强制实现多视角一致性。其核心思想在于:编辑后对应点应具有相似外观。为实现该目标,我们引入一致性损失函数,引导去噪过程朝向协同编辑方向进行。该框架具有灵活性,可与多种不同图像编辑方法结合使用,支持密集与稀疏两种多视角编辑场景。实验结果表明,相比现有方法,我们的方法显著提升了三维一致性。我们还证明,这种增强的一致性能够实现高斯泼溅的高质量编辑,保留清晰细节并忠实遵循用户指定的文本提示。视频结果请参见项目页面:https://3d-consistent-editing.github.io/