We propose GaussCtrl, a text-driven method to edit a 3D scene reconstructed by the 3D Gaussian Splatting (3DGS). Our method first renders a collection of images by using the 3DGS and edits them by using a pre-trained 2D diffusion model (ControlNet) based on the input prompt, which is then used to optimise the 3D model. Our key contribution is multi-view consistent editing, which enables editing all images together instead of iteratively editing one image while updating the 3D model as in previous works. It leads to faster editing as well as higher visual quality. This is achieved by the two terms: (a) depth-conditioned editing that enforces geometric consistency across multi-view images by leveraging naturally consistent depth maps. (b) attention-based latent code alignment that unifies the appearance of edited images by conditioning their editing to several reference views through self and cross-view attention between images' latent representations. Experiments demonstrate that our method achieves faster editing and better visual results than previous state-of-the-art methods.
翻译:我们提出GaussCtrl,一种文本驱动方法,用于编辑由三维高斯泼溅(3DGS)重建的三维场景。该方法首先利用3DGS渲染一组图像,然后基于输入提示,通过预训练的二维扩散模型(ControlNet)编辑这些图像,进而优化三维模型。我们的核心贡献在于多视角一致编辑,该方法能够同时编辑所有图像,而非如先前工作那样迭代编辑单张图像并同步更新三维模型。这既加快了编辑速度,又提升了视觉质量。该功能通过两项机制实现:(a)深度条件化编辑,利用自然一致的深度图强制多视角图像之间的几何一致性;(b)基于注意力的潜码对齐,通过图像潜表示之间的自注意力与跨视角注意力,将编辑结果统一至若干参考视角的外观。实验表明,与先前最先进方法相比,本方法可实现更快的编辑速度与更优的视觉结果。