We propose GaussCtrl, a text-driven method to edit a 3D scene reconstructed by the 3D Gaussian Splatting (3DGS). Our method first renders a collection of images by using the 3DGS and edits them by using a pre-trained 2D diffusion model (ControlNet) based on the input prompt, which is then used to optimise the 3D model. Our key contribution is multi-view consistent editing, which enables editing all images together instead of iteratively editing one image while updating the 3D model as in previous works. It leads to faster editing as well as higher visual quality. This is achieved by the two terms: (a) depth-conditioned editing that enforces geometric consistency across multi-view images by leveraging naturally consistent depth maps. (b) attention-based latent code alignment that unifies the appearance of edited images by conditioning their editing to several reference views through self and cross-view attention between images' latent representations. Experiments demonstrate that our method achieves faster editing and better visual results than previous state-of-the-art methods.
翻译:我们提出GaussCtrl,一种用于编辑由三维高斯抛雪(3DGS)重建的三维场景的文本驱动方法。该方法首先利用3DGS渲染一组图像,并基于输入提示通过预训练的二维扩散模型(ControlNet)对这些图像进行编辑,随后用于优化三维模型。我们的核心贡献在于实现多视图一致编辑,该技术使所有图像能够同步编辑,而非像先前工作中那样逐一编辑单张图像并更新三维模型。这一改进不仅显著提升编辑速度,还带来更优的视觉质量。其实现依赖于两个关键策略:(a)深度条件编辑——通过利用天然一致的深度图强制多视图图像的几何一致性;(b)基于注意力的潜在编码对齐——通过图像潜在表示间的自注意力与交叉视图注意力,将编辑结果约束至若干参考视图,从而统一被编辑图像的外观。实验表明,本方法在编辑速度和视觉质量上均优于现有最先进技术。