Deep visual Simultaneous Localization and Mapping (SLAM) techniques, e.g., DROID, have made significant advancements by leveraging deep visual odometry on dense flow fields. In general, they heavily rely on global visual similarity matching. However, the ambiguous similarity interference in uncertain regions could often lead to excessive noise in correspondences, ultimately misleading SLAM in geometric modeling. To address this issue, we propose a Learnable Gaussian Uncertainty (LGU) matching. It mainly focuses on precise correspondence construction. In our scheme, a learnable 2D Gaussian uncertainty model is designed to associate matching-frame pairs. It could generate input-dependent Gaussian distributions for each correspondence map. Additionally, a multi-scale deformable correlation sampling strategy is devised to adaptively fine-tune the sampling of each direction by a priori look-up ranges, enabling reliable correlation construction. Furthermore, a KAN-bias GRU component is adopted to improve a temporal iterative enhancement for accomplishing sophisticated spatio-temporal modeling with limited parameters. The extensive experiments on real-world and synthetic datasets are conducted to validate the effectiveness and superiority of our method.
翻译:深度视觉同步定位与建图(SLAM)技术,例如DROID,通过利用稠密光流场上的深度视觉里程计取得了显著进展。这类方法通常严重依赖全局视觉相似性匹配。然而,不确定区域中存在的模糊相似性干扰往往会导致对应关系中产生过多噪声,最终误导SLAM的几何建模。为解决此问题,我们提出了一种可学习高斯不确定性(LGU)匹配方法。该方法主要聚焦于精确对应关系的构建。在我们的方案中,设计了一个可学习的二维高斯不确定性模型来关联匹配帧对。该模型能够为每个对应图生成与输入相关的高斯分布。此外,设计了一种多尺度可变形相关性采样策略,通过先验查找范围自适应地微调各方向的采样,从而实现可靠的相关性构建。进一步地,采用了一个KAN偏置门控循环单元组件,以改进时序迭代增强,从而以有限的参数完成复杂的时空建模。我们在真实世界和合成数据集上进行了大量实验,验证了所提方法的有效性和优越性。