Monocular visual odometry (MVO) is vital in autonomous navigation and robotics, providing a cost-effective and flexible motion tracking solution, but the inherent scale ambiguity in monocular setups often leads to cumulative errors over time. In this paper, we present BEV-ODOM, a novel MVO framework leveraging the Bird's Eye View (BEV) Representation to address scale drift. Unlike existing approaches, BEV-ODOM integrates a depth-based perspective-view (PV) to BEV encoder, a correlation feature extraction neck, and a CNN-MLP-based decoder, enabling it to estimate motion across three degrees of freedom without the need for depth supervision or complex optimization techniques. Our framework reduces scale drift in long-term sequences and achieves accurate motion estimation across various datasets, including NCLT, Oxford, and KITTI. The results indicate that BEV-ODOM outperforms current MVO methods, demonstrating reduced scale drift and higher accuracy.
翻译:单目视觉里程计(MVO)在自主导航和机器人技术中至关重要,它提供了一种经济高效且灵活的运动跟踪解决方案,但单目设置中固有的尺度模糊性常常导致随时间累积的误差。本文提出BEV-ODOM,一种新颖的MVO框架,它利用鸟瞰图(BEV)表征来解决尺度漂移问题。与现有方法不同,BEV-ODOM集成了一个基于深度的透视视图(PV)到BEV的编码器、一个相关性特征提取颈部以及一个基于CNN-MLP的解码器,使其能够在无需深度监督或复杂优化技术的情况下,估计三个自由度的运动。我们的框架在长期序列中减少了尺度漂移,并在包括NCLT、Oxford和KITTI在内的多个数据集上实现了精确的运动估计。结果表明,BEV-ODOM优于当前的MVO方法,展现出更低的尺度漂移和更高的准确性。