In this paper, we propose CGI-Stereo, a novel neural network architecture that can concurrently achieve real-time performance, competitive accuracy, and strong generalization ability. The core of our CGI-Stereo is a Context and Geometry Fusion (CGF) block which adaptively fuses context and geometry information for more effective cost aggregation and meanwhile provides feedback to feature learning to guide more effective contextual feature extraction. The proposed CGF can be easily embedded into many existing stereo matching networks, such as PSMNet, GwcNet and ACVNet. The resulting networks show a significant improvement in accuracy. Specially, the model which incorporates our CGF with ACVNet ranks $1^{st}$ on the KITTI 2012 and 2015 leaderboards among all the published methods. We further propose an informative and concise cost volume, named Attention Feature Volume (AFV), which exploits a correlation volume as attention weights to filter a feature volume. Based on CGF and AFV, the proposed CGI-Stereo outperforms all other published real-time methods on KITTI benchmarks and shows better generalization ability than other real-time methods. Code is available at https://github.com/gangweiX/CGI-Stereo.
翻译:本文提出CGI-Stereo,一种能够同时实现实时性能、竞争性精度和强泛化能力的新型神经网络架构。该网络的核心是上下文与几何融合(CGF)块,它能自适应融合上下文和几何信息以实现更有效的代价聚合,同时向特征学习提供反馈以引导更有效的上下文特征提取。所提出的CGF可轻松嵌入PSMNet、GwcNet和ACVNet等现有立体匹配网络,显著提升其匹配精度。特别地,将CGF与ACVNet结合的模型在KITTI 2012和2015排行榜上位列所有已发表方法之首。我们进一步提出一种信息丰富且简洁的代价体——注意力特征体(AFV),利用相关体作为注意力权重过滤特征体。基于CGF和AFV,所提出的CGI-Stereo在KITTI基准测试中优于所有其他已发表的实时方法,并展现出比其它实时方法更强的泛化能力。代码已开源至https://github.com/gangweiX/CGI-Stereo。