3D Gaussian Splatting (3DGS) has become the de facto method of 3D representation in many vision tasks. This calls for the 3D understanding directly in this representation space. To facilitate the research in this direction, we first build a large-scale dataset of 3DGS using the commonly used ShapeNet and ModelNet datasets. Our dataset ShapeSplat consists of 65K objects from 87 unique categories, whose labels are in accordance with the respective datasets. The creation of this dataset utilized the compute equivalent of 2 GPU years on a TITAN XP GPU. We utilize our dataset for unsupervised pretraining and supervised finetuning for classification and segmentation tasks. To this end, we introduce \textbf{\textit{Gaussian-MAE}}, which highlights the unique benefits of representation learning from Gaussian parameters. Through exhaustive experiments, we provide several valuable insights. In particular, we show that (1) the distribution of the optimized GS centroids significantly differs from the uniformly sampled point cloud (used for initialization) counterpart; (2) this change in distribution results in degradation in classification but improvement in segmentation tasks when using only the centroids; (3) to leverage additional Gaussian parameters, we propose Gaussian feature grouping in a normalized feature space, along with splats pooling layer, offering a tailored solution to effectively group and embed similar Gaussians, which leads to notable improvement in finetuning tasks.
翻译:三维高斯溅射(3DGS)已成为众多视觉任务中三维表示的事实标准方法。这要求直接在三维高斯溅射表示空间中进行三维理解。为促进该方向的研究,我们首先利用常用的ShapeNet和ModelNet数据集构建了一个大规模的3DGS数据集。我们的数据集ShapeSplat包含来自87个独特类别的6.5万个物体,其标签与相应数据集保持一致。该数据集的创建在TITAN XP GPU上消耗了相当于2个GPU年的计算量。我们利用该数据集进行分类和分割任务的无监督预训练与有监督微调。为此,我们提出了\textbf{\textit{Gaussian-MAE}},该方法凸显了从高斯参数进行表示学习的独特优势。通过大量实验,我们提供了若干有价值的见解。具体而言,我们证明:(1)优化后的GS质心分布与用于初始化的均匀采样点云分布存在显著差异;(2)当仅使用质心时,这种分布变化会导致分类性能下降,但能提升分割任务性能;(3)为利用额外的高斯参数,我们提出在归一化特征空间中进行高斯特征分组,并结合溅射池化层,为有效分组和嵌入相似高斯提供了定制化解决方案,从而在微调任务中实现了显著提升。