Reinforcement learning (RL) is an effective approach to motion planning in autonomous driving, where an optimal driving policy can be automatically learned using the interaction data with the environment. Nevertheless, the reward function for an RL agent, which is significant to its performance, is challenging to be determined. The conventional work mainly focuses on rewarding safe driving states but does not incorporate the awareness of risky driving behaviors of the vehicles. In this paper, we investigate how to use risk-aware reward shaping to leverage the training and test performance of RL agents in autonomous driving. Based on the essential requirements that prescribe the safety specifications for general autonomous driving in practice, we propose additional reshaped reward terms that encourage exploration and penalize risky driving behaviors. A simulation study in OpenAI Gym indicates the advantage of risk-aware reward shaping for various RL agents. Also, we point out that proximal policy optimization (PPO) is likely to be the best RL method that works with risk-aware reward shaping.
翻译:强化学习(RL)是自动驾驶运动规划中的有效方法,通过与环境交互的数据可自动学习最优驾驶策略。然而,对RL智能体性能至关重要的奖励函数却难以确定。传统工作主要侧重于奖励安全驾驶状态,但未纳入对车辆危险驾驶行为的风险感知。本文研究如何利用风险感知奖励塑造来提升自动驾驶中RL智能体在训练和测试阶段的性能。基于规范实际通用自动驾驶安全规范的基本要求,我们提出额外的重塑奖励项,以鼓励探索行为并惩罚危险驾驶行为。在OpenAI Gym中进行的仿真研究表明,风险感知奖励塑造对多种RL智能体具有优势。此外,我们指出近端策略优化(PPO)可能是与风险感知奖励塑造结合的最佳RL方法。