SDL-MVS: View Space and Depth Deformable Learning Paradigm for Multi-View Stereo Reconstruction in Remote Sensing

Research on multi-view stereo based on remote sensing images has promoted the development of large-scale urban 3D reconstruction. However, remote sensing multi-view image data suffers from the problems of occlusion and uneven brightness between views during acquisition, which leads to the problem of blurred details in depth estimation. To solve the above problem, we re-examine the deformable learning method in the Multi-View Stereo task and propose a novel paradigm based on view Space and Depth deformable Learning (SDL-MVS), aiming to learn deformable interactions of features in different view spaces and deformably model the depth ranges and intervals to enable high accurate depth estimation. Specifically, to solve the problem of view noise caused by occlusion and uneven brightness, we propose a Progressive Space deformable Sampling (PSS) mechanism, which performs deformable learning of sampling points in the 3D frustum space and the 2D image space in a progressive manner to embed source features to the reference feature adaptively. To further optimize the depth, we introduce Depth Hypothesis deformable Discretization (DHD), which achieves precise positioning of the depth prior by adaptively adjusting the depth range hypothesis and performing deformable discretization of the depth interval hypothesis. Finally, our SDL-MVS achieves explicit modeling of occlusion and uneven brightness faced in multi-view stereo through the deformable learning paradigm of view space and depth, achieving accurate multi-view depth estimation. Extensive experiments on LuoJia-MVS and WHU datasets show that our SDL-MVS reaches state-of-the-art performance. It is worth noting that our SDL-MVS achieves an MAE error of 0.086, an accuracy of 98.9% for <0.6m, and 98.9% for <3-interval on the LuoJia-MVS dataset under the premise of three views as input.

翻译：基于遥感影像的多视立体研究推动了大规模城市三维重建的发展。然而，遥感多视图像数据在采集过程中存在视点间遮挡与亮度不均的问题，导致深度估计细节模糊。为解决上述问题，我们重新审视多视立体任务中的可变形学习方法，提出了一种基于视空间与深度可变形学习的新型范式（SDL-MVS），旨在学习特征在不同视空间中的可变形交互，并对深度范围与间隔进行可变形建模，以实现高精度深度估计。具体而言，针对遮挡与亮度不均引起的视点噪声问题，我们提出渐进式空间可变形采样机制，以渐进方式在三维视锥空间与二维图像空间对采样点进行可变形学习，从而将源特征自适应嵌入参考特征。为进一步优化深度估计，我们引入深度假设可变形离散化方法，通过自适应调整深度范围假设并对深度间隔假设进行可变形离散化，实现深度先验的精准定位。最终，我们的SDL-MVS通过视空间与深度的可变形学习范式，实现了对多视立体中遮挡与亮度不均问题的显式建模，从而获得精确的多视深度估计。在LuoJia-MVS与WHU数据集上的大量实验表明，SDL-MVS达到了最先进的性能。值得注意的是，在以三视图输入为前提的条件下，我们的方法在LuoJia-MVS数据集上取得了0.086的MAE误差，<0.6m精度达到98.9%，<3间隔精度达到98.9%。