Injecting semantics into 3D Gaussian Splatting (3DGS) has recently garnered significant attention. While current approaches typically distill 3D semantic features from 2D foundational models (e.g., CLIP and SAM) to facilitate novel view segmentation and semantic understanding, their heavy reliance on 2D supervision can undermine cross-view semantic consistency and necessitate complex data preparation processes, therefore hindering view-consistent scene understanding. In this work, we present FreeGS, an unsupervised semantic-embedded 3DGS framework that achieves view-consistent 3D scene understanding without the need for 2D labels. Instead of directly learning semantic features, we introduce the IDentity-coupled Semantic Field (IDSF) into 3DGS, which captures both semantic representations and view-consistent instance indices for each Gaussian. We optimize IDSF with a two-step alternating strategy: semantics help to extract coherent instances in 3D space, while the resulting instances regularize the injection of stable semantics from 2D space. Additionally, we adopt a 2D-3D joint contrastive loss to enhance the complementarity between view-consistent 3D geometry and rich semantics during the bootstrapping process, enabling FreeGS to uniformly perform tasks such as novel-view semantic segmentation, object selection, and 3D object detection. Extensive experiments on LERF-Mask, 3D-OVS, and ScanNet datasets demonstrate that FreeGS performs comparably to state-of-the-art methods while avoiding the complex data preprocessing workload.
翻译:为3D高斯泼溅(3DGS)注入语义信息近来受到广泛关注。现有方法通常从2D基础模型(如CLIP和SAM)中提取3D语义特征以支持新视角分割与语义理解,但其对2D监督的严重依赖会损害跨视角语义一致性,且需要复杂的数据准备流程,从而阻碍视角一致的场景理解。本研究提出FreeGS——一种无需2D标注即可实现视角一致3D场景理解的无监督语义嵌入3DGS框架。我们未直接学习语义特征,而是在3DGS中引入身份耦合语义场(IDSF),该场为每个高斯单元同时捕获语义表征与视角一致的实例索引。我们通过两步交替策略优化IDSF:语义信息帮助在3D空间中提取连贯实例,而生成的实例则对从2D空间注入的稳定语义进行正则化。此外,我们采用2D-3D联合对比损失以增强自举过程中视角一致的3D几何结构与丰富语义之间的互补性,使FreeGS能统一执行新视角语义分割、对象选取及3D目标检测等任务。在LERF-Mask、3D-OVS和ScanNet数据集上的大量实验表明,FreeGS在避免复杂数据预处理工作的同时,其性能与现有先进方法相当。