Unsignalized intersections are typically considered as one of the most representative and challenging scenarios for self-driving vehicles. To tackle autonomous driving problems in such scenarios, this paper proposes a curriculum proximal policy optimization (CPPO) framework with stage-decaying clipping. By adjusting the clipping parameter during different stages of training through proximal policy optimization (PPO), the vehicle can first rapidly search for an approximate optimal policy or its neighborhood with a large parameter, and then converges to the optimal policy with a small one. Particularly, the stage-based curriculum learning technology is incorporated into the proposed framework to improve the generalization performance and further accelerate the training process. Moreover, the reward function is specially designed in view of different curriculum settings. A series of comparative experiments are conducted in intersection-crossing scenarios with bi-lane carriageways to verify the effectiveness of the proposed CPPO method. The results show that the proposed approach demonstrates better adaptiveness to different dynamic and complex environments, as well as faster training speed over baseline methods.
翻译:无信号交叉口通常被认为是自动驾驶车辆最具代表性和挑战性的场景之一。为解决此类场景中的自动驾驶问题,本文提出了一种基于阶段衰减裁剪的课程近端策略优化(CPPO)框架。通过在近端策略优化(PPO)训练的不同阶段调整裁剪参数,车辆首先能够使用较大的参数快速搜索到近似最优策略或其邻域,然后收敛到具有小参数的最优策略。特别地,该框架融入了基于阶段的课程学习技术,以提升泛化性能并进一步加速训练过程。此外,针对不同的课程设置,专门设计了奖励函数。在双车道交叉口通行场景中开展了一系列对比实验,验证了所提CPPO方法的有效性。结果表明,与基线方法相比,所提方法在不同动态及复杂环境下展现出更强的适应性,同时训练速度更快。