3D scene reconstruction from 2D images has been a long-standing task. Instead of estimating per-frame depth maps and fusing them in 3D, recent research leverages the neural implicit surface as a unified representation for 3D reconstruction. Equipped with data-driven pre-trained geometric cues, these methods have demonstrated promising performance. However, inaccurate prior estimation, which is usually inevitable, can lead to suboptimal reconstruction quality, particularly in some geometrically complex regions. In this paper, we propose a two-stage training process, decouple view-dependent and view-independent colors, and leverage two novel consistency constraints to enhance detail reconstruction performance without requiring extra priors. Additionally, we introduce an essential mask scheme to adaptively influence the selection of supervision constraints, thereby improving performance in a self-supervised paradigm. Experiments on synthetic and real-world datasets show the capability of reducing the interference from prior estimation errors and achieving high-quality scene reconstruction with rich geometric details.
翻译:从二维图像进行三维场景重建是一项长期任务。与逐帧估计深度图并在三维空间中融合的方法不同,近期研究利用神经隐式表面作为三维重建的统一表示。结合数据驱动的预训练几何线索,这些方法已展现出令人期待的性能。然而,通常难以避免的先验估计误差可能导致重建质量欠佳,尤其是在一些几何结构复杂的区域。本文提出两阶段训练流程,将视角依赖与视角无关的颜色进行解耦,并利用两种新颖的一致性约束来增强细节重建性能,无需额外先验。此外,我们引入一种关键的掩码方案,以自适应地影响监督约束的选择,从而在自监督范式下提升性能。在合成与真实数据集上的实验表明,该方法能够减少先验估计误差的干扰,实现具有丰富几何细节的高质量场景重建。