The recent Gaussian Splatting achieves high-quality and real-time novel-view synthesis of the 3D scenes. However, it is solely concentrated on the appearance and geometry modeling, while lacking in fine-grained object-level scene understanding. To address this issue, we propose Gaussian Grouping, which extends Gaussian Splatting to jointly reconstruct and segment anything in open-world 3D scenes. We augment each Gaussian with a compact Identity Encoding, allowing the Gaussians to be grouped according to their object instance or stuff membership in the 3D scene. Instead of resorting to expensive 3D labels, we supervise the Identity Encodings during the differentiable rendering by leveraging the 2D mask predictions by SAM, along with introduced 3D spatial consistency regularization. Comparing to the implicit NeRF representation, we show that the discrete and grouped 3D Gaussians can reconstruct, segment and edit anything in 3D with high visual quality, fine granularity and efficiency. Based on Gaussian Grouping, we further propose a local Gaussian Editing scheme, which shows efficacy in versatile scene editing applications, including 3D object removal, inpainting, colorization and scene recomposition. Our code and models will be at https://github.com/lkeab/gaussian-grouping.
翻译:近期的高斯泼溅技术实现了对3D场景的高质量、实时新视角合成。然而,该方法仅关注外观与几何建模,缺乏细粒度的物体级场景理解。为解决该问题,我们提出高斯分组(Gaussian Grouping),将高斯泼溅扩展为在开放世界3D场景中联合重建与分割任何物体。通过为每个高斯分布附加紧凑的身份编码(Identity Encoding),我们实现了根据3D场景中物体实例或材质属性对高斯分布进行分组。为避免对昂贵3D标签的依赖,我们在可微分渲染过程中,利用SAM的2D掩膜预测结果结合引入的3D空间一致性正则化对身份编码进行监督。与隐式NeRF表示相比,离散分组后的3D高斯分布能够以高视觉质量、细粒度及高效性重建、分割并编辑3D场景中的任何物体。基于高斯分组,我们进一步提出局部高斯编辑方案,在多样化场景编辑应用中展现出有效性,包括3D物体移除、修补、着色及场景重组。相关代码与模型将发布至https://github.com/lkeab/gaussian-grouping。