3D Gaussian Splatting (3DGS) serves as a highly performant and efficient encoding of scene geometry, appearance, and semantics. Moreover, grounding language in 3D scenes has proven to be an effective strategy for 3D scene understanding. Current Language Gaussian Splatting line of work fall into three main groups: (i) per-scene optimization-based, (ii) per-scene optimization-free, and (iii) generalizable approach. However, most of them are evaluated only on rendered 2D views of a handful of scenes and viewpoints close to the training views, limiting ability and insight into holistic 3D understanding. To address this gap, we propose the first large-scale benchmark that systematically assesses these three groups of methods directly in 3D space, evaluating on 1060 scenes across three indoor datasets and one outdoor dataset. Benchmark results demonstrate a clear advantage of the generalizable paradigm, particularly in relaxing the scene-specific limitation, enabling fast feed-forward inference on novel scenes, and achieving superior segmentation performance. We further introduce GaussianWorld-49K a carefully curated 3DGS dataset comprising around 49K diverse indoor and outdoor scenes obtained from multiple sources, with which we demonstrate the generalizable approach could harness strong data priors. Our codes, benchmark, and datasets are released at https://scenesplatpp.gaussianworld.ai/.
翻译:三维高斯溅射(3DGS)作为一种高性能且高效的场景几何、外观与语义编码方法,已被广泛应用。此外,将语言信息锚定于三维场景中已被证明是理解三维场景的有效策略。当前的语言高斯溅射研究工作主要分为三类:(i)基于逐场景优化的方法,(ii)无需逐场景优化的方法,以及(iii)泛化性方法。然而,大多数方法仅在少量场景的渲染二维视图及接近训练视角的视点上进行评估,这限制了对整体三维理解能力的深入洞察。为弥补这一不足,我们提出了首个大规模基准,系统性地在三维空间中直接评估这三类方法,覆盖三个室内数据集和一个室外数据集共1060个场景。基准测试结果表明,泛化性范式具有明显优势,特别是在放宽场景特定限制、实现对新场景的快速前馈推理以及获得更优的分割性能方面。我们进一步引入了GaussianWorld-49K数据集,这是一个精心构建的3DGS数据集,包含从多个来源获取的约4.9万个多样化的室内外场景。通过该数据集,我们展示了泛化性方法能够有效利用强大的数据先验。我们的代码、基准及数据集已发布于https://scenesplatpp.gaussianworld.ai/。