Due to the difficulty in obtaining real samples and ground truth, the generalization performance and the fine-tuned performance are critical for the feasibility of stereo matching methods in real-world applications. However, the presence of substantial disparity distributions and density variations across different datasets presents significant challenges for the generalization and fine-tuning of the model. In this paper, we propose a novel stereo matching method, called SR-Stereo, which mitigates the distributional differences across different datasets by predicting the disparity clips and uses a loss weight related to the regression target scale to improve the accuracy of the disparity clips. Moreover, this stepwise regression architecture can be easily extended to existing iteration-based methods to improve the performance without changing the structure. In addition, to mitigate the edge blurring of the fine-tuned model on sparse ground truth, we propose Domain Adaptation Based on Pre-trained Edges (DAPE). Specifically, we use the predicted disparity and RGB image to estimate the edge map of the target domain image. The edge map is filtered to generate edge map background pseudo-labels, which together with the sparse ground truth disparity on the target domain are used as a supervision to jointly fine-tune the pre-trained stereo matching model. These proposed methods are extensively evaluated on SceneFlow, KITTI, Middbury 2014 and ETH3D. The SR-Stereo achieves competitive disparity estimation performance and state-of-the-art cross-domain generalisation performance. Meanwhile, the proposed DAPE significantly improves the disparity estimation performance of fine-tuned models, especially in the textureless and detail regions.
翻译:由于真实样本及真值获取困难,泛化性能与微调性能对于立体匹配方法在实际应用中的可行性至关重要。然而,不同数据集间存在的显著视差分布与密度差异,给模型的泛化与微调带来了严峻挑战。本文提出一种新型立体匹配方法SR-Stereo,通过预测视差片段来缓解不同数据集间的分布差异,并利用与回归目标尺度相关的损失权重提升视差片段的精度。此外,该逐步回归架构可轻松扩展至现有基于迭代的方法,在无需改变结构的前提下提升性能。为缓解微调模型在稀疏真值上的边缘模糊问题,我们提出基于预训练边缘的域自适应方法(DAPE)。具体而言,利用预测视差与RGB图像估计目标域图像的边缘图,对边缘图进行滤波生成边缘背景伪标签,将其与目标域上的稀疏视差真值共同作为监督信号,联合微调预训练立体匹配模型。所提方法在SceneFlow、KITTI、Middbury 2014及ETH3D数据集上进行了广泛评估。SR-Stereo在视差估计性能与跨域泛化性能上均达到竞争性水平,其中跨域泛化性能达到当前最优。同时,所提DAPE方法显著提升了微调模型的视差估计性能,尤其在无纹理区域与细节区域。