Omnidirectional stereo matching (OSM) is an essential and reliable means for $360^{\circ}$ depth sensing. However, following earlier works on conventional stereo matching, prior state-of-the-art (SOTA) methods rely on a 3D encoder-decoder block to regularize the cost volume, causing the whole system complicated and sub-optimal results. Recently, the Recurrent All-pairs Field Transforms (RAFT) based approach employs the recurrent update in 2D and has efficiently improved image-matching tasks, ie, optical flow, and stereo matching. To bridge the gap between OSM and RAFT, we mainly propose an opposite adaptive weighting scheme to seamlessly transform the outputs of spherical sweeping of OSM into the required inputs for the recurrent update, thus creating a recurrent omnidirectional stereo matching (RomniStereo) algorithm. Furthermore, we introduce two techniques, ie, grid embedding and adaptive context feature generation, which also contribute to RomniStereo's performance. Our best model improves the average MAE metric by 40.7\% over the previous SOTA baseline across five datasets. When visualizing the results, our models demonstrate clear advantages on both synthetic and realistic examples. The code is available at \url{https://github.com/HalleyJiang/RomniStereo}.
翻译:全方位立体匹配(OSM)是实现360°深度感知的关键可靠方法。然而,遵循传统立体匹配早期工作的范式,先前的最优(SOTA)方法依赖三维编码器-解码器模块进行代价体的正则化,导致整个系统复杂化且结果欠优。近期,基于递归全对场变换(RAFT)的方法采用二维递归更新机制,显著提升了图像匹配任务(如光流和立体匹配)的效果。为弥合OSM与RAFT之间的差距,我们提出一种自适应对向加权方案,可将OSM球面扫描的输出无缝转化为递归更新所需的输入,从而构建递归式全方位立体匹配(RomniStereo)算法。此外,我们引入网格嵌入与自适应上下文特征生成两项技术,进一步提升了RomniStereo的性能。在五个数据集上,最佳模型相较于先前SOTA基线将平均MAE指标提升了40.7%。可视化结果表明,我们的模型在合成与真实样例上均具有显著优势。代码开源地址:\url{https://github.com/HalleyJiang/RomniStereo}。