Reinforcement Learning (RL) has the potential to surpass human performance in driving without needing any expert supervision. Despite its promise, the state-of-the-art in sensorimotor self-driving is dominated by imitation learning methods due to the inherent shortcomings of RL algorithms. Nonetheless, RL agents are able to discover highly successful policies when provided with privileged ground truth representations of the environment. In this work, we investigate what separates privileged RL agents from sensorimotor agents for urban driving in order to bridge the gap between the two. We propose vision-based deep learning models to approximate the privileged representations from sensor data. In particular, we identify aspects of state representation that are crucial for the success of the RL agent such as desired route generation and stop zone prediction, and propose solutions to gradually develop less privileged RL agents. We also observe that bird's-eye-view models trained on offline datasets do not generalize to online RL training due to distribution mismatch. Through rigorous evaluation on the CARLA simulation environment, we shed light on the significance of the state representations in RL for autonomous driving and point to unresolved challenges for future research.
翻译:强化学习(RL)在无需专家监督的情况下,有潜力超越人类驾驶表现。尽管前景广阔,但由于RL算法固有的缺陷,当前最先进的感知运动自动驾驶仍以模仿学习方法为主。然而,当RL智能体能够获取环境的特权真实状态表示时,它们可以学习到非常成功的策略。在本文中,我们研究城市驾驶中特权RL智能体与感知运动智能体之间的差异,以弥合两者之间的差距。我们提出基于视觉的深度学习模型来从传感器数据中近似特权表示。具体而言,我们识别出对RL智能体成功至关重要的状态表示方面,如期望路径生成和停止区域预测,并提出逐步开发更少依赖特权信息的RL智能体的解决方案。我们还观察到,由于分布不匹配,在离线数据集上训练的鸟瞰图模型无法泛化到在线RL训练。通过在CARLA仿真环境中的严格评估,我们揭示了RL中状态表示对自动驾驶的重要性,并指出了未来研究中尚未解决的挑战。