Reinforcement learning (RL) has recently proven great success in various domains. Yet, the design of the reward function requires detailed domain expertise and tedious fine-tuning to ensure that agents are able to learn the desired behaviour. Using a sparse reward conveniently mitigates these challenges. However, the sparse reward represents a challenge on its own, often resulting in unsuccessful training of the agent. In this paper, we therefore address the sparse reward problem in RL. Our goal is to find an effective alternative to reward shaping, without using costly human demonstrations, that would also be applicable to a wide range of domains. Hence, we propose to use model predictive control~(MPC) as an experience source for training RL agents in sparse reward environments. Without the need for reward shaping, we successfully apply our approach in the field of mobile robot navigation both in simulation and real-world experiments with a Kuboki Turtlebot 2. We furthermore demonstrate great improvement over pure RL algorithms in terms of success rate as well as number of collisions and timeouts. Our experiments show that MPC as an experience source improves the agent's learning process for a given task in the case of sparse rewards.
翻译:强化学习(RL)最近在各种领域取得了巨大成功。然而,奖励函数的设计需要详细的领域专业知识和繁琐的调优,以确保智能体能够学习到期望的行为。使用稀疏奖励可以方便地缓解这些挑战。但稀疏奖励本身也构成了一个难题,常常导致智能体训练失败。因此,本文旨在解决强化学习中的稀疏奖励问题。我们的目标是找到一种有效的奖励塑形替代方案,无需使用昂贵的人类示范,且适用于广泛的领域。为此,我们提出使用模型预测控制(MPC)作为经验来源,在稀疏奖励环境中训练强化学习智能体。无需奖励塑形,我们成功地将该方法应用于移动机器人导航领域,并在仿真和实际实验中(使用Kuboki Turtlebot 2机器人)验证了其有效性。此外,我们展示了该方法在成功率、碰撞次数和超时时间等方面相比纯强化学习算法的显著改进。实验表明,在稀疏奖励情况下,将MPC作为经验来源能够改善智能体对给定任务的学习过程。