3D Gaussians have recently emerged as an effective scene representation for real-time splatting and accurate novel-view synthesis, motivating several works to adapt multi-view structure prediction networks to regress per-pixel 3D Gaussians from images. However, most prior work extends these networks to predict additional Gaussian parameters -- orientation, scale, opacity, and appearance -- while relying almost exclusively on view-synthesis supervision. We show that a view-synthesis loss alone is insufficient to recover geometrically meaningful splats in this setting. We analyze and address the ambiguities of learning 3D Gaussian splats under self-supervision for pose-free generalizable splatting, and introduce G3Splat, which enforces geometric priors to obtain geometrically consistent 3D scene representations. Trained on RE10K, our approach achieves state-of-the-art performance in (i) geometrically consistent reconstruction, (ii) relative pose estimation, and (iii) novel-view synthesis. We further demonstrate strong zero-shot generalization on ScanNet, substantially outperforming prior work in both geometry recovery and relative pose estimation. Code and pretrained models are released on our project page (https://m80hz.github.io/g3splat/).
翻译:3D高斯最近已成为实时溅射和精确新视角合成的有效场景表示方法,这促使多项研究将多视角结构预测网络适配为从图像回归逐像素的3D高斯。然而,大多数先前工作将这些网络扩展为预测额外的高斯参数——方向、尺度、不透明度和外观——同时几乎完全依赖视角合成监督。我们证明,仅凭视角合成损失在此设置下不足以恢复具有几何意义的溅射体。我们分析并解决了在自监督下学习3D高斯溅射以实现免姿态可泛化溅射的模糊性问题,并提出了G3Splat,该方法通过强制几何先验来获得几何一致的3D场景表示。在RE10K数据集上训练后,我们的方法在以下方面达到了最先进的性能:(i) 几何一致重建,(ii) 相对姿态估计,以及 (iii) 新视角合成。我们进一步在ScanNet上展示了强大的零样本泛化能力,在几何恢复和相对姿态估计方面均显著优于先前工作。代码和预训练模型已在项目页面(https://m80hz.github.io/g3splat/)发布。