Robots have been successfully used to perform tasks with high precision. In real-world environments with sparse rewards and multiple goals, learning is still a major challenge and Reinforcement Learning (RL) algorithms fail to learn good policies. Training in simulation environments and then fine-tuning in the real world is a common approach. However, adapting to the real-world setting is a challenge. In this paper, we present a method named Ready for Production Hierarchical RL (ReProHRL) that divides tasks with hierarchical multi-goal navigation guided by reinforcement learning. We also use object detectors as a pre-processing step to learn multi-goal navigation and transfer it to the real world. Empirical results show that the proposed ReProHRL method outperforms the state-of-the-art baseline in simulation and real-world environments in terms of both training time and performance. Although both methods achieve a 100% success rate in a simple environment for single goal-based navigation, in a more complex environment and multi-goal setting, the proposed method outperforms the baseline by 18% and 5%, respectively. For the real-world implementation and proof of concept demonstration, we deploy the proposed method on a nano-drone named Crazyflie with a front camera to perform multi-goal navigation experiments.
翻译:机器人已被成功应用于执行高精度任务。在奖励稀疏且存在多个目标的真实环境中,学习仍是一项重大挑战,强化学习算法难以习得有效策略。通过在仿真环境中训练并在真实世界中微调是常见方法,然而适应真实环境仍是难题。本文提出一种名为"可投产分层强化学习"(ReProHRL)的方法,该方法通过强化学习指导的分层多目标导航来分解任务。我们还将目标检测器作为预处理步骤,以学习多目标导航并将其迁移至真实世界。实验结果表明,在仿真和真实环境中,所提出的ReProHRL方法在训练时间和性能两方面均优于现有最优基线方法。尽管两种方法在简单环境中的单目标导航任务上均达到100%成功率,但在更复杂环境与多目标场景下,本方法分别以18%和5%的优势超越基线。为验证真实世界中的实施效果与概念验证,我们将所提方法部署于名为Crazyflie的搭载前置摄像头的纳米无人机上,开展多目标导航实验。