Deep reinforcement learning (RL) has brought many successes for autonomous robot navigation. However, there still exists important limitations that prevent real-world use of RL-based navigation systems. For example, most learning approaches lack safety guarantees; and learned navigation systems may not generalize well to unseen environments. Despite a variety of recent learning techniques to tackle these challenges in general, a lack of an open-source benchmark and reproducible learning methods specifically for autonomous navigation makes it difficult for roboticists to choose what learning methods to use for their mobile robots and for learning researchers to identify current shortcomings of general learning methods for autonomous navigation. In this paper, we identify four major desiderata of applying deep RL approaches for autonomous navigation: (D1) reasoning under uncertainty, (D2) safety, (D3) learning from limited trial-and-error data, and (D4) generalization to diverse and novel environments. Then, we explore four major classes of learning techniques with the purpose of achieving one or more of the four desiderata: memory-based neural network architectures (D1), safe RL (D2), model-based RL (D2, D3), and domain randomization (D4). By deploying these learning techniques in a new open-source large-scale navigation benchmark and real-world environments, we perform a comprehensive study aimed at establishing to what extent can these techniques achieve these desiderata for RL-based navigation systems.
翻译:深度强化学习在自主机器人导航领域取得了诸多成功。然而,仍存在若干重要局限阻碍基于强化学习的导航系统在实际场景中的应用。例如,多数学习方法缺乏安全保障;且学习到的导航系统可能难以泛化到未知环境。尽管近年来已有多种学习技术尝试应对这些普遍挑战,但由于缺乏面向自主导航的专用开源基准和可复现学习方法,机器人领域研究者难以为其移动机器人选择合适的学习方法,而学习领域研究者亦难以识别当前通用学习方法在自主导航任务中的固有缺陷。本文首先提出将深度强化学习方法应用于自主导航的四项核心需求:不确定条件下的推理(D1)、安全性(D2)、从有限试错数据中学习(D3)以及对多样化新颖环境的泛化能力(D4)。进而探索四类旨在满足上述需求的主流学习技术:基于记忆的神经网络架构(面向D1)、安全强化学习(D2)、基于模型的强化学习(D2, D3)以及领域随机化(D4)。通过在新型开源大规模导航基准环境与真实环境中的部署实验,我们开展系统性研究,旨在明确这些技术能在多大程度上实现面向强化学习导航系统的上述核心需求。