Advances in image diffusion models have recently led to notable improvements in the generation of high-quality images. In combination with Neural Radiance Fields (NeRFs), they enabled new opportunities in 3D generation. However, most generative 3D approaches are object-centric and applying them to editing existing photorealistic scenes is not trivial. We propose SIGNeRF, a novel approach for fast and controllable NeRF scene editing and scene-integrated object generation. A new generative update strategy ensures 3D consistency across the edited images, without requiring iterative optimization. We find that depth-conditioned diffusion models inherently possess the capability to generate 3D consistent views by requesting a grid of images instead of single views. Based on these insights, we introduce a multi-view reference sheet of modified images. Our method updates an image collection consistently based on the reference sheet and refines the original NeRF with the newly generated image set in one go. By exploiting the depth conditioning mechanism of the image diffusion model, we gain fine control over the spatial location of the edit and enforce shape guidance by a selected region or an external mesh.
翻译:图像扩散模型的最新进展显著提升了高质量图像的生成能力。结合神经辐射场(NeRF)技术后,这些模型为三维生成领域开辟了新的可能性。然而,大多数生成式三维方法以物体为中心,将其应用于编辑现有真实感场景时存在显著困难。我们提出SIGNeRF——一种用于快速可控的NeRF场景编辑与场景集成物体生成的新方法。该方法的生成式更新策略无需迭代优化即可确保编辑图像在三维空间中的一致性。研究发现,深度条件扩散模型通过请求图像网格而非单视图,本身便具备生成三维一致视角的能力。基于此发现,我们引入由编辑后的图像构成的多视角参考拼图。该方法能基于参考拼图一致地更新图像集合,并一次性利用新生成的图像集优化原始NeRF。通过利用图像扩散模型的深度条件机制,我们实现了对编辑空间位置的精细控制,并可通过选定区域或外部网格强制执行形状引导。