Deep learning-based multi-view stereo has emerged as a powerful paradigm for reconstructing the complete geometrically-detailed objects from multi-views. Most of the existing approaches only estimate the pixel-wise depth value by minimizing the gap between the predicted point and the intersection of ray and surface, which usually ignore the surface topology. It is essential to the textureless regions and surface boundary that cannot be properly reconstructed. To address this issue, we suggest to take advantage of point-to-surface distance so that the model is able to perceive a wider range of surfaces. To this end, we predict the distance volume from cost volume to estimate the signed distance of points around the surface. Our proposed RA-MVSNet is patch-awared, since the perception range is enhanced by associating hypothetical planes with a patch of surface. Therefore, it could increase the completion of textureless regions and reduce the outliers at the boundary. Moreover, the mesh topologies with fine details can be generated by the introduced distance volume. Comparing to the conventional deep learning-based multi-view stereo methods, our proposed RA-MVSNet approach obtains more complete reconstruction results by taking advantage of signed distance supervision. The experiments on both the DTU and Tanks \& Temples datasets demonstrate that our proposed approach achieves the state-of-the-art results.
翻译:基于深度学习的多视角立体技术已成为从多视角重建完整几何细节物体的强大范式。现有方法大多仅通过最小化预测点与射线及表面交点之间的差距来估计像素级深度值,这通常忽略了表面拓扑结构。这对于无法正确重建的无纹理区域和表面边界至关重要。为解决这一问题,我们建议利用点到表面的距离,使模型能够感知更广泛的表面。为此,我们从代价体积中预测距离体积,以估计表面周围点的符号距离。我们提出的RA-MVSNet具有区域感知能力,因为通过将假设平面与表面块关联,增强了感知范围。因此,它能够提高无纹理区域的完整性并减少边界处的异常点,此外,引入的距离体积可以生成具有精细细节的网格拓扑。与传统的基于深度学习的多视角立体方法相比,我们提出的RA-MVSNet方法通过利用符号距离监督,获得了更完整的重建结果。在DTU和Tanks & Temples数据集上的实验表明,我们的方法达到了最先进的性能。