We address the challenge of enhancing navigation autonomy for planetary space rovers using reinforcement learning (RL). The ambition of future space missions necessitates advanced autonomous navigation capabilities for rovers to meet mission objectives. RL's potential in robotic autonomy is evident, but its reliance on simulations poses a challenge. Transferring policies to real-world scenarios often encounters the "reality gap", disrupting the transition from virtual to physical environments. The reality gap is exacerbated in the context of mapless navigation on Mars and Moon-like terrains, where unpredictable terrains and environmental factors play a significant role. Effective navigation requires a method attuned to these complexities and real-world data noise. We introduce a novel two-stage RL approach using offline noisy data. Our approach employs a teacher-student policy learning paradigm, inspired by the "learning by cheating" method. The teacher policy is trained in simulation. Subsequently, the student policy is trained on noisy data, aiming to mimic the teacher's behaviors while being more robust to real-world uncertainties. Our policies are transferred to a custom-designed rover for real-world testing. Comparative analyses between the teacher and student policies reveal that our approach offers improved behavioral performance, heightened noise resilience, and more effective sim-to-real transfer.
翻译:我们致力于解决利用强化学习增强行星探测车导航自主性的挑战。未来空间任务的雄心要求探测车具备先进的自主导航能力以实现任务目标。强化学习在机器人自主性方面的潜力显而易见,但其对仿真的依赖构成了一项挑战。将策略迁移到真实世界场景时常遭遇"现实差距",阻碍了从虚拟环境到物理环境的过渡。在火星和月球类似地形上的无地图导航中,现实差距尤为加剧,不可预测的地形和环境因素起着关键作用。有效的导航需要一种适应这些复杂性和真实世界数据噪声的方法。我们提出了一种新颖的两阶段强化学习方法,利用离线噪声数据。我们的方法采用师生策略学习范式,灵感来源于"作弊式学习"方法。教师策略在仿真环境中训练。随后,学生策略在噪声数据上训练,旨在模仿教师行为同时增强对真实世界不确定性的鲁棒性。我们将策略迁移至定制设计的探测车进行真实世界测试。教师策略与学生策略的比较分析表明,我们的方法提供了改进的行为性能、更高的噪声鲁棒性以及更有效的仿真到真实迁移。