We present DINO Patch Visual Odometry (DINO-VO), an end-to-end monocular visual odometry system with strong scene generalization. Current Visual Odometry (VO) systems often rely on heuristic feature extraction strategies, which can degrade accuracy and robustness, particularly in large-scale outdoor environments. DINO-VO addresses these limitations by incorporating a differentiable adaptive patch selector into the end-to-end pipeline, improving the quality of extracted patches and enhancing generalization across diverse datasets. Additionally, our system integrates a multi-task feature extraction module with a differentiable bundle adjustment (BA) module that leverages inverse depth priors, enabling the system to learn and utilize appearance and geometric information effectively. This integration bridges the gap between feature learning and state estimation. Extensive experiments on the TartanAir, KITTI, Euroc, and TUM datasets demonstrate that DINO-VO exhibits strong generalization across synthetic, indoor, and outdoor environments, achieving state-of-the-art tracking accuracy.
翻译:我们提出DINO Patch视觉里程计(DINO-VO),一种具有强大场景泛化能力的端到端单目视觉里程计系统。当前的视觉里程计(VO)系统通常依赖启发式特征提取策略,这会在大规模户外环境中降低精度和鲁棒性。DINO-VO通过将可微自适应补丁选择器整合到端到端流程中,解决了这些局限性,提升了提取补丁的质量,并增强了跨不同数据集的泛化能力。此外,我们的系统集成了多任务特征提取模块与利用逆深度先验的可微光束法平差(BA)模块,使系统能够有效学习并利用外观和几何信息。这一集成弥合了特征学习与状态估计之间的差距。在TartanAir、KITTI、Euroc和TUM数据集上的大量实验表明,DINO-VO在合成、室内和室外环境中展现出强大的泛化能力,实现了最先进的跟踪精度。