Recently, 3D Gaussian Splatting (3DGS) has emerged as an efficient approach for accurately representing scenes. However, despite its superior novel view synthesis capabilities, extracting the geometry of the scene directly from the Gaussian properties remains a challenge, as those are optimized based on a photometric loss. While some concurrent models have tried adding geometric constraints during the Gaussian optimization process, they still produce noisy, unrealistic surfaces. We propose a novel approach for bridging the gap between the noisy 3DGS representation and the smooth 3D mesh representation, by injecting real-world knowledge into the depth extraction process. Instead of extracting the geometry of the scene directly from the Gaussian properties, we instead extract the geometry through a pre-trained stereo-matching model. We render stereo-aligned pairs of images corresponding to the original training poses, feed the pairs into a stereo model to get a depth profile, and finally fuse all of the profiles together to get a single mesh. The resulting reconstruction is smoother, more accurate and shows more intricate details compared to other methods for surface reconstruction from Gaussian Splatting, while only requiring a small overhead on top of the fairly short 3DGS optimization process. We performed extensive testing of the proposed method on in-the-wild scenes, obtained using a smartphone, showcasing its superior reconstruction abilities. Additionally, we tested the method on the Tanks and Temples and DTU benchmarks, achieving state-of-the-art results.
翻译:近年来,三维高斯泼溅(3DGS)已成为精确表示场景的高效方法。然而,尽管其具备卓越的新视角合成能力,直接从高斯属性中提取场景几何结构仍存在挑战,因为这些属性是基于光度损失优化的。尽管一些同期模型尝试在高斯优化过程中添加几何约束,但其生成的表面仍存在噪声且不够真实。我们提出了一种新颖方法,通过将真实世界知识注入深度提取过程,以弥合噪声三维高斯泼溅表示与平滑三维网格表示之间的差距。我们并非直接从高斯属性中提取场景几何,而是通过预训练的立体匹配模型进行几何提取。我们渲染与原始训练位姿对应的立体对齐图像对,将其输入立体模型以获取深度剖面,最终融合所有剖面得到单一网格。与从高斯泼溅进行表面重建的其他方法相比,所得重建结果更平滑、更精确,并能展现更精细的细节,同时仅需在相对简短的三维高斯泼溅优化过程基础上增加少量计算开销。我们在使用智能手机获取的真实场景中对所提方法进行了广泛测试,展示了其卓越的重建能力。此外,我们在Tanks and Temples和DTU基准测试中验证了该方法,取得了最先进的结果。