Deep learning-based multi-view stereo has emerged as a powerful paradigm for reconstructing the complete geometrically-detailed objects from multi-views. Most of the existing approaches only estimate the pixel-wise depth value by minimizing the gap between the predicted point and the intersection of ray and surface, which usually ignore the surface topology. It is essential to the textureless regions and surface boundary that cannot be properly reconstructed. To address this issue, we suggest to take advantage of point-to-surface distance so that the model is able to perceive a wider range of surfaces. To this end, we predict the distance volume from cost volume to estimate the signed distance of points around the surface. Our proposed RA-MVSNet is patch-awared, since the perception range is enhanced by associating hypothetical planes with a patch of surface. Therefore, it could increase the completion of textureless regions and reduce the outliers at the boundary. Moreover, the mesh topologies with fine details can be generated by the introduced distance volume. Comparing to the conventional deep learning-based multi-view stereo methods, our proposed RA-MVSNet approach obtains more complete reconstruction results by taking advantage of signed distance supervision. The experiments on both the DTU and Tanks \& Temples datasets demonstrate that our proposed approach achieves the state-of-the-art results.
翻译:基于深度学习的多视角立体技术已成为从多视角重建完整几何细节物体的强大范式。现有方法大多仅通过最小化预测点与光线-表面交点之间的差距来估计像素级深度值,这通常忽略了表面拓扑结构。这对难以正确重建的无纹理区域和表面边界至关重要。为解决此问题,我们提出利用点-表面距离,使模型能够感知更广泛的表面。为此,我们从代价体预测距离体,以估计表面周围点的符号距离。我们提出的RA-MVSNet具有区域感知能力,通过将假设平面与表面区域块相关联来增强感知范围。因此,它能提高无纹理区域的完整性,并减少边界处的异常值。此外,引入的距离体可以生成具有精细细节的网格拓扑。与传统的基于深度学习的方法相比,我们提出的RA-MVSNet方法通过利用符号距离监督获得了更完整的重建结果。在DTU和Tanks & Temples数据集上的实验表明,我们提出的方法取得了最先进的成果。