Stereo matching, a critical step of 3D reconstruction, has fully shifted towards deep learning due to its strong feature representation of remote sensing images. However, ground truth for stereo matching task relies on expensive airborne LiDAR data, thus making it difficult to obtain enough samples for supervised learning. To improve the generalization ability of stereo matching networks on cross-domain data from different sensors and scenarios, in this paper, we dedicate to study key training factors from three perspectives. (1) For the selection of training dataset, it is important to select data with similar regional target distribution as the test set instead of utilizing data from the same sensor. (2) For model structure, cascaded structure that flexibly adapts to different sizes of features is preferred. (3) For training manner, unsupervised methods generalize better than supervised methods, and we design an unsupervised early-stop strategy to help retain the best model with pre-trained weights as the basis. Extensive experiments are conducted to support the previous findings, on the basis of which we present an unsupervised stereo matching network with good generalization performance. We release the source code and the datasets at https://github.com/Elenairene/RKF_RSSM to reproduce the results and encourage future work.
翻译:立体匹配作为三维重建的关键步骤,因其对遥感图像强大的特征表示能力,已完全转向深度学习。然而,立体匹配任务的真值依赖于昂贵的机载激光雷达数据,因此难以获得足够的样本进行监督学习。为提高立体匹配网络在不同传感器和场景的跨域数据上的泛化能力,本文致力于从三个角度研究关键训练因素。(1)在训练数据集的选择上,选择与测试集具有相似区域目标分布的数据,而非使用来自同一传感器的数据,至关重要。(2)在模型结构方面,能够灵活适应不同特征尺寸的级联结构更受青睐。(3)在训练方式上,无监督方法比有监督方法泛化能力更强,我们设计了一种无监督早停策略,以预训练权重为基础,帮助保留最佳模型。我们进行了大量实验以支持上述发现,并在此基础上提出了一种具有良好泛化性能的无监督立体匹配网络。我们在 https://github.com/Elenairene/RKF_RSSM 发布了源代码和数据集,以便复现结果并促进未来工作。