We introduce ViCA-NeRF, the first view-consistency-aware method for 3D editing with text instructions. In addition to the implicit neural radiance field (NeRF) modeling, our key insight is to exploit two sources of regularization that explicitly propagate the editing information across different views, thus ensuring multi-view consistency. For geometric regularization, we leverage the depth information derived from NeRF to establish image correspondences between different views. For learned regularization, we align the latent codes in the 2D diffusion model between edited and unedited images, enabling us to edit key views and propagate the update throughout the entire scene. Incorporating these two strategies, our ViCA-NeRF operates in two stages. In the initial stage, we blend edits from different views to create a preliminary 3D edit. This is followed by a second stage of NeRF training, dedicated to further refining the scene's appearance. Experimental results demonstrate that ViCA-NeRF provides more flexible, efficient (3 times faster) editing with higher levels of consistency and details, compared with the state of the art. Our code is publicly available.
翻译:我们提出了ViCA-NeRF,这是首个面向文本指令三维编辑、具备视图一致性感知能力的方法。除了隐式神经辐射场建模外,我们的核心洞察在于利用两种正则化来源,显式地将编辑信息在不同视图间传播,从而确保多视角一致性。在几何正则化方面,我们利用从NeRF导出的深度信息建立不同视图之间的图像对应关系。在学习正则化方面,我们对齐二维扩散模型中编辑图像与未编辑图像之间的潜在编码,从而能够编辑关键视图并将更新传播至整个场景。整合这两种策略,ViCA-NeRF分两阶段运行:初始阶段融合来自不同视图的编辑结果以生成初步的三维编辑,随后进入第二阶段NeRF训练,专门用于进一步优化场景外观。实验结果表明,与现有最优方法相比,ViCA-NeRF能实现更灵活、更高效(速度提升3倍)的编辑,且具备更高的一致性与细节保真度。我们的代码已公开。