Accurate surface reconstruction from unposed images is crucial for efficient 3D object or scene creation. However, it remains challenging, particularly for the joint camera pose estimation. Previous approaches have achieved impressive pose-free surface reconstruction results in dense-view settings, but could easily fail for sparse-view scenarios without sufficient visual overlap. In this paper, we propose a new technique for pose-free surface reconstruction, which follows triplane-based signed distance field (SDF) learning but regularizes the learning by explicit points sampled from ray-based diffusion of camera pose estimation. Our key contribution is a novel Geometric Consistent Ray Diffusion model (GCRayDiffusion), where we represent camera poses as neural bundle rays and regress the distribution of noisy rays via a diffusion model. More importantly, we further condition the denoising process of RGRayDiffusion using the triplane-based SDF of the entire scene, which provides effective 3D consistent regularization to achieve multi-view consistent camera pose estimation. Finally, we incorporate RGRayDiffusion into the triplane-based SDF learning by introducing on-surface geometric regularization from the sampling points of the neural bundle rays, which leads to highly accurate pose-free surface reconstruction results even for sparse-view inputs. Extensive evaluations on public datasets show that our GCRayDiffusion achieves more accurate camera pose estimation than previous approaches, with geometrically more consistent surface reconstruction results, especially given sparse-view inputs.
翻译:从无姿态图像中进行精确的表面重建对于高效创建三维物体或场景至关重要。然而,这仍然具有挑战性,尤其是在联合相机姿态估计方面。先前的方法在密集视角设置下已取得了令人印象深刻的免姿态表面重建结果,但在视觉重叠不足的稀疏视角场景下极易失败。本文提出了一种新的免姿态表面重建技术,该方法遵循基于三平面符号距离场的学习,但通过从相机姿态估计的光线扩散中采样的显式点对学习过程进行正则化。我们的核心贡献是一个新颖的几何一致光线扩散模型,其中我们将相机姿态表示为神经束光线,并通过扩散模型回归含噪光线的分布。更重要的是,我们进一步利用整个场景的基于三平面的SDF来调节RGRayDiffusion的去噪过程,这提供了有效的三维一致性正则化,以实现多视角一致的相机姿态估计。最后,我们通过引入来自神经束光线采样点的表面几何正则化,将RGRayDiffusion整合到基于三平面的SDF学习中,从而即使在稀疏视角输入下也能获得高精度的免姿态表面重建结果。在公开数据集上的广泛评估表明,我们的GCRayDiffusion比先前方法实现了更精确的相机姿态估计,并获得了几何上更一致的表面重建结果,尤其是在给定稀疏视角输入时。