Neural implicit representations have emerged as a powerful paradigm for 3D reconstruction. However, despite their success, existing methods fail to capture fine geometric details and thin structures, especially in scenarios where only sparse RGB views of the objects of interest are available. We hypothesize that current methods for learning neural implicit representations from RGB or RGBD images produce 3D surfaces with missing parts and details because they only rely on 0-order differential properties, i.e. the 3D surface points and their projections, as supervisory signals. Such properties, however, do not capture the local 3D geometry around the points and also ignore the interactions between points. This paper demonstrates that training neural representations with first-order differential properties, i.e. surface normals, leads to highly accurate 3D surface reconstruction even in situations where only as few as two RGB (front and back) images are available. Given multiview RGB images of an object of interest, we first compute the approximate surface normals in the image space using the gradient of the depth maps produced using an off-the-shelf monocular depth estimator such as Depth Anything model. An implicit surface regressor is then trained using a loss function that enforces the first-order differential properties of the regressed surface to match those estimated from Depth Anything. Our extensive experiments on a wide range of real and synthetic datasets show that the proposed method achieves an unprecedented level of reconstruction accuracy even when using as few as two RGB views. The detailed ablation study also demonstrates that normal-based supervision plays a key role in this significant improvement in performance, enabling the 3D reconstruction of intricate geometric details and thin structures that were previously challenging to capture.
翻译:神经隐式表示已成为三维重建的强大范式。然而,尽管取得了成功,现有方法仍难以捕捉精细的几何细节和薄壁结构,尤其是在仅能获取目标物体稀疏RGB视图的场景中。我们假设,当前从RGB或RGBD图像学习神经隐式表示的方法之所以产生缺失部件和细节的三维表面,是因为它们仅依赖零阶微分属性(即三维表面点及其投影)作为监督信号。然而,此类属性无法捕捉点周围的局部三维几何信息,也忽略了点之间的相互作用。本文证明,使用一阶微分属性(即表面法线)训练神经表示,即使在仅能获取少至两张RGB(正面和背面)图像的情况下,也能实现高度精确的三维表面重建。给定目标物体的多视角RGB图像,我们首先使用现成的单目深度估计器(如Depth Anything模型)生成的深度图梯度,在图像空间中计算近似表面法线。随后,通过损失函数训练隐式表面回归器,该函数强制回归表面的一阶微分属性与Depth Anything估计的属性相匹配。我们在大量真实和合成数据集上的广泛实验表明,即使使用少至两个RGB视图,所提方法也能达到前所未有的重建精度。详细的消融研究也证明,基于法线的监督在这一显著性能提升中起着关键作用,使得以往难以捕捉的复杂几何细节和薄壁结构的三维重建成为可能。