Despite the remarkable progress facilitated by learning-based stereo-matching algorithms, the performance in the ill-conditioned regions, such as the occluded regions, remains a bottleneck. Due to the limited receptive field, existing CNN-based methods struggle to handle these ill-conditioned regions effectively. To address this issue, this paper introduces a novel attention-based stereo-matching network called Global Occlusion-Aware Transformer (GOAT) to exploit long-range dependency and occlusion-awareness global context for disparity estimation. In the GOAT architecture, a parallel disparity and occlusion estimation module PDO is proposed to estimate the initial disparity map and the occlusion mask using a parallel attention mechanism. To further enhance the disparity estimates in the occluded regions, an occlusion-aware global aggregation module (OGA) is proposed. This module aims to refine the disparity in the occluded regions by leveraging restricted global correlation within the focus scope of the occluded areas. Extensive experiments were conducted on several public benchmark datasets including SceneFlow, KITTI 2015, and Middlebury. The results show that the proposed GOAT demonstrates outstanding performance among all benchmarks, particularly in the occluded regions.
翻译:尽管基于学习的立体匹配算法取得了显著进展,但在遮挡区域等病态区域的性能仍存在瓶颈。由于感受野受限,现有基于CNN的方法难以有效处理这些病态区域。为解决此问题,本文提出一种新颖的基于注意力的立体匹配网络——全局遮挡感知Transformer(GOAT),旨在利用长程依赖性和遮挡感知全局上下文进行视差估计。在GOAT架构中,提出并行视差与遮挡估计模块PDO,通过并行注意力机制估计初始视差图和遮挡掩码。为进一步增强遮挡区域的视差估计,提出了遮挡感知全局聚合模块(OGA),该模块通过利用遮挡区域聚焦范围内的受限全局相关性,优化遮挡区域的视差估计。在包括SceneFlow、KITTI 2015和Middlebury在内的多个公开基准数据集上进行了大量实验。结果表明,所提出的GOAT在所有基准测试中均展现出卓越性能,尤其在遮挡区域表现突出。