Feat2GS: Probing Visual Foundation Models with Gaussian Splatting

Given that visual foundation models (VFMs) are trained on extensive datasets but often limited to 2D images, a natural question arises: how well do they understand the 3D world? With the differences in architecture and training protocols (i.e., objectives, proxy tasks), a unified framework to fairly and comprehensively probe their 3D awareness is urgently needed. Existing works on 3D probing suggest single-view 2.5D estimation (e.g., depth and normal) or two-view sparse 2D correspondence (e.g., matching and tracking). Unfortunately, these tasks ignore texture awareness, and require 3D data as ground-truth, which limits the scale and diversity of their evaluation set. To address these issues, we introduce Feat2GS, which readout 3D Gaussians attributes from VFM features extracted from unposed images. This allows us to probe 3D awareness for geometry and texture via novel view synthesis, without requiring 3D data. Additionally, the disentanglement of 3DGS parameters - geometry ($\boldsymbol{x}$, $α$, $Σ$) and texture ($\boldsymbol{c}$) - enables separate analysis of texture and geometry awareness. Under Feat2GS, we conduct extensive experiments to probe the 3D awareness of several VFMs, and investigate the ingredients that lead to a 3D aware VFM. Building on these findings, we develop several variants that achieve state-of-the-art across diverse datasets. This makes Feat2GS useful for probing VFMs, and as a simple-yet-effective baseline for novel-view synthesis. Code and data are available at https://fanegg.github.io/Feat2GS/.

翻译：鉴于视觉基础模型（VFMs）在大规模数据集上训练，但通常仅限于二维图像，一个自然的问题是：它们对三维世界的理解程度如何？由于模型架构与训练方案（即目标函数、代理任务）存在差异，亟需一个统一框架来公平且全面地探究其三维感知能力。现有三维探测研究多集中于单视图2.5维估计（如深度与法向量）或双视图稀疏二维对应（如匹配与跟踪）。然而，这些任务忽略了纹理感知能力，且需要三维数据作为真值，限制了评估集的规模与多样性。为解决这些问题，我们提出Feat2GS，该方法从无位姿图像提取的VFM特征中重建三维高斯属性。这使得我们能够通过新视角合成来探究几何与纹理的三维感知能力，且无需三维数据。此外，三维高斯溅射（3DGS）参数的解耦——几何属性（$\boldsymbol{x}$, $α$, $Σ$）与纹理属性（$\boldsymbol{c}$）——支持对纹理与几何感知能力进行独立分析。基于Feat2GS框架，我们开展了大量实验以探究多种VFMs的三维感知能力，并深入分析了影响VFM三维感知能力的关键因素。基于这些发现，我们开发了多个变体模型，在多样化数据集上实现了最先进的性能。这使Feat2GS不仅成为探究VFMs的有效工具，同时也可作为新视角合成任务中简洁而高效的基准方法。代码与数据已发布于 https://fanegg.github.io/Feat2GS/。