Recurrent All-Pairs Field Transforms (RAFT) has shown great potentials in matching tasks. However, all-pairs correlations lack non-local geometry knowledge and have difficulties tackling local ambiguities in ill-posed regions. In this paper, we propose Iterative Geometry Encoding Volume (IGEV-Stereo), a new deep network architecture for stereo matching. The proposed IGEV-Stereo builds a combined geometry encoding volume that encodes geometry and context information as well as local matching details, and iteratively indexes it to update the disparity map. To speed up the convergence, we exploit GEV to regress an accurate starting point for ConvGRUs iterations. Our IGEV-Stereo ranks $1^{st}$ on KITTI 2015 and 2012 (Reflective) among all published methods and is the fastest among the top 10 methods. In addition, IGEV-Stereo has strong cross-dataset generalization as well as high inference efficiency. We also extend our IGEV to multi-view stereo (MVS), i.e. IGEV-MVS, which achieves competitive accuracy on DTU benchmark. Code is available at https://github.com/gangweiX/IGEV.
翻译:循环全对场变换(RAFT)在匹配任务中展现了巨大潜力。然而,全对相关性缺乏非局部几何知识,难以解决病态区域中的局部歧义。本文提出迭代几何编码体(IGEV-Stereo),一种用于立体匹配的新型深度网络架构。所提出的IGEV-Stereo构建了联合几何编码体,编码几何与上下文信息以及局部匹配细节,并迭代索引该编码体以更新视差图。为加速收敛,我们利用GEV回归ConvGRUs迭代的精确起始点。我们的IGEV-Stereo在所有已发表方法中排名KITTI 2015和2012(反射)数据集第一名,且在排名前10的方法中推理速度最快。此外,IGEV-Stereo具有强跨数据集泛化能力和高推理效率。我们还将IGEV扩展至多视图立体(MVS),即IGEV-MVS,在DTU基准上取得了具有竞争力的精度。代码见https://github.com/gangweiX/IGEV。