The recent Gaussian Splatting achieves high-quality and real-time novel-view synthesis of the 3D scenes. However, it is solely concentrated on the appearance and geometry modeling, while lacking in fine-grained object-level scene understanding. To address this issue, we propose Gaussian Grouping, which extends Gaussian Splatting to jointly reconstruct and segment anything in open-world 3D scenes. We augment each Gaussian with a compact Identity Encoding, allowing the Gaussians to be grouped according to their object instance or stuff membership in the 3D scene. Instead of resorting to expensive 3D labels, we supervise the Identity Encodings during the differentiable rendering by leveraging the 2D mask predictions by Segment Anything Model (SAM), along with introduced 3D spatial consistency regularization. Compared to the implicit NeRF representation, we show that the discrete and grouped 3D Gaussians can reconstruct, segment and edit anything in 3D with high visual quality, fine granularity and efficiency. Based on Gaussian Grouping, we further propose a local Gaussian Editing scheme, which shows efficacy in versatile scene editing applications, including 3D object removal, inpainting, colorization, style transfer and scene recomposition. Our code and models are at https://github.com/lkeab/gaussian-grouping.
翻译:近期提出的高斯泼溅技术实现了三维场景高质量、实时的新颖视角合成。然而,该方法仅聚焦于外观与几何建模,缺乏细粒度物体级别的场景理解能力。为解决此问题,我们提出高斯分组方法,将高斯泼溅技术扩展至开放世界三维场景的联合重建与任意物体分割。我们为每个高斯基元引入紧凑的身份编码,使其能够根据三维场景中的物体实例或材质类别进行分组。该方法无需依赖昂贵的三维标注数据,而是通过可微分渲染过程,利用Segment Anything Model(SAM)预测的二维掩码及引入的三维空间一致性正则化来监督身份编码的学习。与隐式神经辐射场表示相比,我们证明离散且分组化的三维高斯基元能够以高视觉质量、细粒度特性和高效性实现三维场景中任意物体的重建、分割与编辑。基于高斯分组框架,我们进一步提出局部高斯编辑方案,该方案在多样化的场景编辑应用中展现出显著效果,包括三维物体移除、修复、着色、风格迁移及场景重组。我们的代码与模型已发布于https://github.com/lkeab/gaussian-grouping。