Visual odometry (VO) is essential for enabling accurate point-goal navigation of embodied agents in indoor environments where GPS and compass sensors are unreliable and inaccurate. However, traditional VO methods face challenges in wide-baseline scenarios, where fast robot motions and low frames per second (FPS) during inference hinder their performance, leading to drift and catastrophic failures in point-goal navigation. Recent deep-learned VO methods show robust performance but suffer from sample inefficiency during training; hence, they require huge datasets and compute resources. So, we propose a robust and sample-efficient VO pipeline based on motion priors available while an agent is navigating an environment. It consists of a training-free action-prior based geometric VO module that estimates a coarse relative pose which is further consumed as a motion prior by a deep-learned VO model, which finally produces a fine relative pose to be used by the navigation policy. This strategy helps our pipeline achieve up to 2x sample efficiency during training and demonstrates superior accuracy and robustness in point-goal navigation tasks compared to state-of-the-art VO method(s). Realistic indoor environments of the Gibson dataset is used in the AI-Habitat simulator to evaluate the proposed approach using navigation metrics (like success/SPL) and pose metrics (like RPE/ATE). We hope this method further opens a direction of work where motion priors from various sources can be utilized to improve VO estimates and achieve better results in embodied navigation tasks.
翻译:视觉里程计对于在GPS和罗盘传感器不可靠且不准确的室内环境中实现具身智能体的精确点目标导航至关重要。然而,传统视觉里程计方法在宽基线场景下面临挑战,其中机器人的快速运动以及推理时的低帧率会阻碍其性能,导致点目标导航中的漂移和灾难性故障。近期基于深度学习的视觉里程计方法展现出鲁棒性能,但在训练过程中存在样本效率低下的问题,因此需要庞大的数据集和计算资源。为此,我们提出了一种鲁棒且样本高效的视觉里程计流程,该流程基于智能体在环境中导航时可获取的运动先验。它包含一个免训练的、基于动作先验的几何视觉里程计模块,该模块估计一个粗略的相对位姿,随后将其作为运动先验输入到一个基于深度学习的视觉里程计模型中,该模型最终输出一个精细的相对位姿供导航策略使用。这一策略使我们的流程在训练中实现了高达2倍的样本效率,并在点目标导航任务中展现出相较于最先进视觉里程计方法更优的准确性和鲁棒性。我们在AI-Habitat模拟器中使用Gibson数据集的真实室内环境,通过导航指标(如成功率/SPL)和位姿指标(如相对位姿误差/绝对轨迹误差)对所提方法进行评估。我们希望该方法能进一步开辟一个研究方向,即利用来自不同来源的运动先验来改进视觉里程计估计,并在具身导航任务中取得更好的结果。