Accurate meshing from monocular images remains a key challenge in 3D vision. While state-of-the-art 3D Gaussian Splatting (3DGS) methods excel at synthesizing photorealistic novel views through rasterization-based rendering, their reliance on sparse, explicit primitives severely limits their ability to recover watertight and topologically consistent 3D surfaces.We introduce MonoGSDF, a novel method that couples Gaussian-based primitives with a neural Signed Distance Field (SDF) for high-quality reconstruction. During training, the SDF guides Gaussians' spatial distribution, while at inference, Gaussians serve as priors to reconstruct surfaces, eliminating the need for memory-intensive Marching Cubes. To handle arbitrary-scale scenes, we propose a scaling strategy for robust generalization. A multi-resolution training scheme further refines details and monocular geometric cues from off-the-shelf estimators enhance reconstruction quality. Experiments on real-world datasets show MonoGSDF outperforms prior methods while maintaining efficiency.
翻译:从单目图像进行精确网格化仍然是三维视觉中的一个关键挑战。虽然最先进的三维高斯溅射(3DGS)方法通过基于光栅化的渲染在合成逼真新视图方面表现出色,但它们对稀疏、显式基元的依赖严重限制了其恢复水密且拓扑一致的三维表面的能力。我们提出了MonoGSDF,这是一种将高斯基元与神经符号距离场(SDF)耦合以实现高质量重建的新方法。在训练期间,SDF指导高斯分布的空间分布;而在推理时,高斯作为先验来重建表面,从而消除了对内存密集型移动立方体算法的需求。为了处理任意尺度的场景,我们提出了一种用于鲁棒泛化的缩放策略。多分辨率训练方案进一步细化细节,并且来自现成估计器的单目几何线索提升了重建质量。在真实世界数据集上的实验表明,MonoGSDF在保持效率的同时优于先前的方法。