Reinforcement learning (RL) provides a compelling framework for enabling autonomous vehicles to continue to learn and improve diverse driving behaviors on their own. However, training real-world autonomous vehicles with current RL algorithms presents several challenges. One critical challenge, often overlooked in these algorithms, is the need to reset a driving environment between every episode. While resetting an environment after each episode is trivial in simulated settings, it demands significant human intervention in the real world. In this paper, we introduce a novel autonomous algorithm that allows off-the-shelf RL algorithms to train an autonomous vehicle with minimal human intervention. Our algorithm takes into account the learning progress of the autonomous vehicle to determine when to abort episodes before it enters unsafe states and where to reset it for subsequent episodes in order to gather informative transitions. The learning progress is estimated based on the novelty of both current and future states. We also take advantage of rule-based autonomous driving algorithms to safely reset an autonomous vehicle to an initial state. We evaluate our algorithm against baselines on diverse urban driving tasks. The experimental results show that our algorithm is task-agnostic and achieves better driving performance with fewer manual resets than baselines.
翻译:强化学习(RL)为自动驾驶车辆持续自主学习并提升多样化驾驶行为提供了一个极具吸引力的框架。然而,利用现有RL算法训练真实世界的自动驾驶车辆面临着若干挑战。其中一个在这些算法中常被忽视的关键挑战,是需要在每个训练回合之间重置驾驶环境。虽然在模拟环境中,每个回合后重置环境是轻而易举的,但在现实世界中,这需要大量的人工干预。本文提出了一种新颖的自主算法,使得现有的RL算法能够以最小化的人工干预来训练自动驾驶车辆。我们的算法综合考虑自动驾驶车辆的学习进度,以决定何时在其进入不安全状态前终止当前回合,以及将其重置至何处以进行后续回合,从而收集信息丰富的状态转移数据。学习进度是基于当前状态与未来状态的新颖性进行估计的。我们还利用基于规则的自动驾驶算法,将自动驾驶车辆安全地重置至初始状态。我们在多种城市驾驶任务上,将所提算法与基线方法进行了对比评估。实验结果表明,我们的算法是任务无关的,并且相较于基线方法,能以更少的人工重置次数实现更优的驾驶性能。