Novel view synthesis has evolved rapidly, advancing from Neural Radiance Fields to 3D Gaussian Splatting (3DGS), which offers real-time rendering and rapid training without compromising visual fidelity. However, 3DGS relies heavily on accurate camera poses and high-quality point cloud initialization, which are difficult to obtain in sparse-view scenarios. While traditional Structure from Motion (SfM) pipelines often fail in these settings, existing learning-based point estimation alternatives typically require reliable reference views and remain sensitive to pose or depth errors. In this work, we propose a robust method utilizing π^3, a reference-free point cloud estimation network. We integrate dense initialization from π^3 with a regularization scheme designed to mitigate geometric inaccuracies. Specifically, we employ uncertainty-guided depth supervision, normal consistency loss, and depth warping. Experimental results demonstrate that our approach achieves state-of-the-art performance on the Tanks and Temples, LLFF, DTU, and MipNeRF360 datasets.
翻译:新视角合成技术发展迅速,从神经辐射场演进至3D高斯泼溅(3DGS),在保持视觉保真度的同时实现了实时渲染与快速训练。然而,3DGS高度依赖精确的相机位姿与高质量点云初始化,这在稀疏视角场景中难以获取。虽然传统运动恢复结构(SfM)流程在此类场景中常失效,现有基于学习的点云估计替代方案通常需要可靠参考视角,且对位姿或深度误差仍保持敏感。本研究提出一种利用无参考点云估计网络π^3的鲁棒方法,将π^3提供的密集初始化与专为缓解几何误差设计的正则化方案相结合。具体而言,我们采用不确定性引导的深度监督、法向一致性损失及深度翘曲策略。实验结果表明,该方法在Tanks and Temples、LLFF、DTU和MipNeRF360数据集上均达到了最先进的性能水平。