3D Gaussian Splatting (3DGS) has recently transformed photorealistic reconstruction, achieving high visual fidelity and real-time performance. However, rendering quality significantly deteriorates when test views deviate from the camera angles used during training, posing a major challenge for applications in immersive free-viewpoint rendering and navigation. In this work, we conduct a comprehensive evaluation of 3DGS and related novel view synthesis methods under out-of-distribution (OOD) test camera scenarios. By creating diverse test cases with synthetic and real-world datasets, we demonstrate that most existing methods, including those incorporating various regularization techniques and data-driven priors, struggle to generalize effectively to OOD views. To address this limitation, we introduce SplatFormer, the first point transformer model specifically designed to operate on Gaussian splats. SplatFormer takes as input an initial 3DGS set optimized under limited training views and refines it in a single forward pass, effectively removing potential artifacts in OOD test views. To our knowledge, this is the first successful application of point transformers directly on 3DGS sets, surpassing the limitations of previous multi-scene training methods, which could handle only a restricted number of input views during inference. Our model significantly improves rendering quality under extreme novel views, achieving state-of-the-art performance in these challenging scenarios and outperforming various 3DGS regularization techniques, multi-scene models tailored for sparse view synthesis, and diffusion-based frameworks.
翻译:3D高斯泼溅(3DGS)近期彻底改变了照片级真实感重建技术,实现了高视觉保真度与实时性能。然而,当测试视角偏离训练所用的相机角度时,渲染质量会显著下降,这为沉浸式自由视点渲染与导航应用带来了重大挑战。在本研究中,我们对3DGS及相关新视角合成方法在分布外(OOD)测试相机场景下进行了全面评估。通过使用合成与真实世界数据集创建多样化的测试案例,我们证明了包括结合了各种正则化技术与数据驱动先验方法在内的大多数现有方法,都难以有效泛化到OOD视角。为解决这一局限,我们提出了SplatFormer,这是首个专门设计用于在高斯泼溅上运行的点Transformer模型。SplatFormer以在有限训练视角下优化的初始3DGS集合作为输入,并在单次前向传播中对其进行细化,有效消除了OOD测试视角中潜在的伪影。据我们所知,这是点Transformer首次成功直接应用于3DGS集合,超越了先前多场景训练方法的限制——那些方法在推理期间只能处理有限数量的输入视角。我们的模型在极端新视角下显著提升了渲染质量,在这些具有挑战性的场景中实现了最先进的性能,超越了各种3DGS正则化技术、专为稀疏视角合成定制的多场景模型以及基于扩散的框架。