We propose GaussCtrl, a text-driven method to edit a 3D scene reconstructed by the 3D Gaussian Splatting (3DGS). Our method first renders a collection of images by using the 3DGS and edits them by using a pre-trained 2D diffusion model (ControlNet) based on the input prompt, which is then used to optimise the 3D model. Our key contribution is multi-view consistent editing, which enables editing all images together instead of iteratively editing one image while updating the 3D model as in previous works. It leads to faster editing as well as higher visual quality. This is achieved by the two terms: (a) depth-conditioned editing that enforces geometric consistency across multi-view images by leveraging naturally consistent depth maps. (b) attention-based latent code alignment that unifies the appearance of edited images by conditioning their editing to several reference views through self and cross-view attention between images' latent representations. Experiments demonstrate that our method achieves faster editing and better visual results than previous state-of-the-art methods.
翻译:我们提出GaussCtrl,一种文本驱动方法,用于编辑由三维高斯喷溅(3DGS)重建的三维场景。该方法首先利用3DGS渲染一组图像,并基于输入提示使用预训练的二维扩散模型(ControlNet)对其进行编辑,随后用于优化三维模型。我们的核心贡献在于多视角一致编辑,使得能够同时编辑所有图像,而非像先前研究那样迭代编辑单张图像同时更新三维模型。这实现了更快的编辑速度和更高的视觉质量。通过两个关键机制实现:(a)深度条件编辑,利用自然一致的深度图强制多视角图像间的几何一致性;(b)基于注意力机制的潜在编码对齐,通过自注意力与交叉视角注意力在图像潜在表示之间将编辑条件化至若干参考视角,从而统一编辑后图像的外观。实验表明,我们的方法在编辑速度和视觉结果上均优于先前最先进方法。