3D Gaussian Splatting (3DGS) serves as a highly performant and efficient encoding of scene geometry, appearance, and semantics. Moreover, grounding language in 3D scenes has proven to be an effective strategy for 3D scene understanding. Current Language Gaussian Splatting line of work fall into three main groups: (i) per-scene optimization-based, (ii) per-scene optimization-free, and (iii) generalizable approach. However, most of them are evaluated only on rendered 2D views of a handful of scenes and viewpoints close to the training views, limiting ability and insight into holistic 3D understanding. To address this gap, we propose the first large-scale benchmark that systematically assesses these three groups of methods directly in 3D space, evaluating on 1060 scenes across three indoor datasets and one outdoor dataset. Benchmark results demonstrate a clear advantage of the generalizable paradigm, particularly in relaxing the scene-specific limitation, enabling fast feed-forward inference on novel scenes, and achieving superior segmentation performance. We further introduce GaussianWorld-49K a carefully curated 3DGS dataset comprising around 49K diverse indoor and outdoor scenes obtained from multiple sources, with which we demonstrate the generalizable approach could harness strong data priors. Our codes, benchmark, and datasets are released at https://scenesplatpp.gaussianworld.ai/.
翻译:三维高斯泼溅(3DGS)作为一种高性能且高效的场景几何、外观与语义编码方法,在三维场景理解中展现出显著优势。将语言信息与三维场景进行关联已被证明是提升场景理解能力的有效策略。当前的语言高斯泼溅研究主要分为三类:(i)基于单场景优化的方法;(ii)免单场景优化的方法;(iii)泛化性方法。然而,现有方法大多仅在少量场景的渲染二维视图及接近训练视角的视点上进行评估,这限制了对整体三维理解能力的深入洞察。为填补这一空白,我们提出了首个直接在三维空间中系统评估这三类方法的大规模基准测试,涵盖三个室内数据集与一个室外数据集共1060个场景。基准测试结果表明,泛化性范式具有明显优势,尤其在突破场景特定限制、实现新场景的快速前馈推理以及获得更优的分割性能方面。我们进一步提出了GaussianWorld-49K——一个精心构建的3DGS数据集,包含从多源获取的约4.9万个多样化室内外场景,并通过该数据集验证了泛化性方法能够有效利用强数据先验。我们的代码、基准测试与数据集已发布于 https://scenesplatpp.gaussianworld.ai/。