In the context of autonomous navigation of terrestrial robots, the creation of realistic models for agent dynamics and sensing is a widespread habit in the robotics literature and in commercial applications, where they are used for model based control and/or for localization and mapping. The more recent Embodied AI literature, on the other hand, focuses on modular or end-to-end agents trained in simulators like Habitat or AI-Thor, where the emphasis is put on photo-realistic rendering and scene diversity, but high-fidelity robot motion is assigned a less privileged role. The resulting sim2real gap significantly impacts transfer of the trained models to real robotic platforms. In this work we explore end-to-end training of agents in simulation in settings which minimize the sim2real gap both, in sensing and in actuation. Our agent directly predicts (discretized) velocity commands, which are maintained through closed-loop control in the real robot. The behavior of the real robot (including the underlying low-level controller) is identified and simulated in a modified Habitat simulator. Noise models for odometry and localization further contribute in lowering the sim2real gap. We evaluate on real navigation scenarios, explore different localization and point goal calculation methods and report significant gains in performance and robustness compared to prior work.
翻译:在地面机器人自主导航的背景下,为智能体运动学和感知构建真实模型是机器人学文献和商业应用中的普遍做法,这些模型被用于基于模型的控制以及定位与地图构建。而近年来兴起的具身智能文献则侧重于模块化或端到端智能体的训练,这类训练通常在Habitat或AI-Thor等模拟器中进行,重点在于照片级真实渲染和场景多样性,但高保真机器人运动却扮演着次要角色。由此产生的模拟到现实差距显著影响了训练模型向真实机器人平台的迁移效果。本研究探索在最小化感知与动作两方面模拟到现实差距的设置下,对智能体进行模拟环境中的端到端训练。我们的智能体直接预测(离散化的)速度指令,这些指令通过真实机器人中的闭环控制得以维持。通过识别真实机器人(包括底层低层控制器)的行为特征,我们在改进的Habitat模拟器中对其进行了仿真复现。里程计与定位的噪声模型进一步有助于缩小模拟到现实差距。我们在真实导航场景中展开评估,探索不同的定位与目标点计算方法,并报告相较于先前工作在性能与鲁棒性方面的显著提升。