Reinforcement learning (RL) is an effective approach to motion planning in autonomous driving, where an optimal driving policy can be automatically learned using the interaction data with the environment. Nevertheless, the reward function for an RL agent, which is significant to its performance, is challenging to be determined. The conventional work mainly focuses on rewarding safe driving states but does not incorporate the awareness of risky driving behaviors of the vehicles. In this paper, we investigate how to use risk-aware reward shaping to leverage the training and test performance of RL agents in autonomous driving. Based on the essential requirements that prescribe the safety specifications for general autonomous driving in practice, we propose additional reshaped reward terms that encourage exploration and penalize risky driving behaviors. A simulation study in OpenAI Gym indicates the advantage of risk-aware reward shaping for various RL agents. Also, we point out that proximal policy optimization (PPO) is likely to be the best RL method that works with risk-aware reward shaping.
翻译:强化学习(RL)是利用环境交互数据自动学习最优驾驶策略的自动驾驶运动规划有效方法。然而,对RL代理性能至关重要的奖励函数难以确定。现有工作主要侧重于奖励安全驾驶状态,但未纳入对车辆危险驾驶行为风险的感知。本文研究如何利用风险感知奖励塑形来提升自动驾驶中RL代理的训练与测试性能。基于实践中通用自动驾驶安全规范的必备需求,我们提出额外塑形奖励项,以鼓励探索并惩罚危险驾驶行为。在OpenAI Gym平台的仿真研究表明,风险感知奖励塑形对多种RL代理具有优势。此外,我们指出近端策略优化(PPO)是与风险感知奖励塑形配合使用的最佳RL方法。