Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields

3D scene representations have gained immense popularity in recent years. Methods that use Neural Radiance fields are versatile for traditional tasks such as novel view synthesis. In recent times, some work has emerged that aims to extend the functionality of NeRF beyond view synthesis, for semantically aware tasks such as editing and segmentation using 3D feature field distillation from 2D foundation models. However, these methods have two major limitations: (a) they are limited by the rendering speed of NeRF pipelines, and (b) implicitly represented feature fields suffer from continuity artifacts reducing feature quality. Recently, 3D Gaussian Splatting has shown state-of-the-art performance on real-time radiance field rendering. In this work, we go one step further: in addition to radiance field rendering, we enable 3D Gaussian splatting on arbitrary-dimension semantic features via 2D foundation model distillation. This translation is not straightforward: naively incorporating feature fields in the 3DGS framework encounters significant challenges, notably the disparities in spatial resolution and channel consistency between RGB images and feature maps. We propose architectural and training changes to efficiently avert this problem. Our proposed method is general, and our experiments showcase novel view semantic segmentation, language-guided editing and segment anything through learning feature fields from state-of-the-art 2D foundation models such as SAM and CLIP-LSeg. Across experiments, our distillation method is able to provide comparable or better results, while being significantly faster to both train and render. Additionally, to the best of our knowledge, we are the first method to enable point and bounding-box prompting for radiance field manipulation, by leveraging the SAM model. Project website at: https://feature-3dgs.github.io/

翻译：三维场景表示近年来获得了极大关注。使用神经辐射场的方法在传统任务（如新视角合成）中具有广泛适用性。近期，一些工作开始尝试将NeRF的功能扩展至视角合成之外，通过从2D基础模型蒸馏3D特征场，实现语义感知任务（如编辑和分割）。然而，这些方法存在两个主要限制：(a) 受限于NeRF管线的渲染速度，以及(b) 隐式表示的特征场存在连续性伪影，降低了特征质量。最近，3D高斯泼溅在实时辐射场渲染中展现了最先进的性能。在本工作中，我们更进一步：除辐射场渲染外，我们通过2D基础模型蒸馏，实现了任意维度语义特征上的3D高斯泼溅。这一转换并非易事：简单地将特征场纳入3DGS框架会面临显著挑战，尤其是RGB图像与特征图之间在空间分辨率和通道一致性上的差异。我们提出了架构与训练层面的改进，以高效规避此问题。所提方法具有通用性，实验展示了通过从SAM和CLIP-LSeg等最先进2D基础模型学习特征场，实现新视角语义分割、语言引导编辑及“分割一切”的能力。跨实验结果表明，我们的蒸馏方法能够提供可比甚至更优的结果，同时显著加快训练和渲染速度。此外，据我们所知，我们是首个利用SAM模型实现基于点和包围框提示的辐射场操控方法。项目网站：https://feature-3dgs.github.io/