Applying Gaussian Splatting to perception tasks for 3D scene understanding is becoming increasingly popular. Most existing works primarily focus on rendering 2D feature maps from novel viewpoints, which leads to an imprecise 3D language field with outlier languages, ultimately failing to align objects in 3D space. By utilizing masked images for feature extraction, these approaches also lack essential contextual information, leading to inaccurate feature representation. To this end, we propose a Language-Embedded Surface Field (LangSurf), which accurately aligns the 3D language fields with the surface of objects, facilitating precise 2D and 3D segmentation with text query, widely expanding the downstream tasks such as removal and editing. The core of LangSurf is a joint training strategy that flattens the language Gaussian on the object surfaces using geometry supervision and contrastive losses to assign accurate language features to the Gaussians of objects. In addition, we also introduce the Hierarchical-Context Awareness Module to extract features at the image level for contextual information then perform hierarchical mask pooling using masks segmented by SAM to obtain fine-grained language features in different hierarchies. Extensive experiments on open-vocabulary 2D and 3D semantic segmentation demonstrate that LangSurf outperforms the previous state-of-the-art method LangSplat by a large margin. As shown in Fig.~\ref{fig:teaser}, our method is capable of segmenting objects in 3D space, thus boosting the effectiveness of our approach in instance recognition, removal, and editing, which is also supported by comprehensive experiments. \url{https://langsurf.github.io}{Project Page}.
翻译:将高斯泼溅应用于三维场景理解的感知任务正变得越来越普遍。现有工作大多侧重于从新视角渲染二维特征图,这导致了存在异常语言的不精确三维语言场,最终无法在三维空间中对齐物体。通过使用掩码图像进行特征提取,这些方法还缺乏关键的上下文信息,导致特征表示不准确。为此,我们提出了一种语言嵌入表面场(LangSurf),它能将三维语言场精确地对齐到物体表面,从而促进基于文本查询的精确二维和三维分割,并广泛拓展了移除与编辑等下游任务。LangSurf的核心是一种联合训练策略,该策略利用几何监督和对比损失将语言高斯平坦化到物体表面,从而为物体的高斯赋予准确的语言特征。此外,我们还引入了层次化上下文感知模块,该模块在图像级别提取特征以获取上下文信息,然后利用SAM分割得到的掩码进行层次化掩码池化,从而获得不同层次上的细粒度语言特征。在开放词汇的二维和三维语义分割上进行的大量实验表明,LangSurf大幅超越了先前的最先进方法LangSplat。如图~\ref{fig:teaser}所示,我们的方法能够在三维空间中分割物体,从而提升了我们在实例识别、移除和编辑方面的有效性,这一点也得到了全面实验的支持。\url{https://langsurf.github.io}{项目页面}。