A safe and efficient decision-making system is crucial for autonomous vehicles. However, the complexity of driving environments limit the effectiveness of many rule-based and machine learning-based decision-making approaches. The introduction of Reinforcement Learning in autonomous driving presents a promising solution to these challenges, although concerns about safety and efficiency during training remain major obstacles to its widespread application. To address these concerns, we propose a novel framework named Simple to Complex Collaborative Decision. First, we rapidly train the teacher model using the Proximal Policy Optimization algorithm in a lightweight autonomous driving simulation environment. In the more complex simulation environment, the teacher model intervenes when the student agent exhibits sub-optimal behavior by assessing the value of actions to avert dangerous situations. Next, we developed an innovative algorithm called Adaptive Clipping Proximal Policy Optimization. It trains using a combination of samples generated by both the teacher and student policies and applies dynamic clipping strategies based on sample importance, enabling the algorithm to utilize samples from diverse sources more efficiently. Additionally, we employ the KL divergence between the teacher's and student's policies as a constraint for policy optimization to facilitate the student agent's rapid learning of the teacher's policy. Finally, by adopting an appropriate weaning strategy to gradually reduce teacher intervention, we ensure that the student agent can fully explore the environment independently during the later stages of training. Simulation experiments in highway lane-change scenarios demonstrate that, compared to baseline algorithms, our proposed framework not only improves learning efficiency and reduces training costs but also significantly enhances safety during training.
翻译:安全高效的决策系统对自动驾驶车辆至关重要。然而,驾驶环境的复杂性限制了许多基于规则和基于机器学习的决策方法的有效性。强化学习在自动驾驶领域的引入为应对这些挑战提供了有前景的解决方案,尽管训练过程中的安全性和效率问题仍是其广泛应用的主要障碍。为解决这些问题,我们提出了一种名为"简单到复杂协同决策"的新框架。首先,我们在轻量级自动驾驶仿真环境中使用近端策略优化算法快速训练教师模型。在更复杂的仿真环境中,当学生智能体表现出次优行为时,教师模型通过评估动作价值来避免危险情况并进行干预。接着,我们开发了一种名为自适应裁剪近端策略优化的创新算法。该算法结合使用教师策略和学生策略生成的样本进行训练,并基于样本重要性应用动态裁剪策略,使算法能更高效地利用来自不同来源的样本。此外,我们采用教师策略与学生策略之间的KL散度作为策略优化的约束,以促进学生智能体快速学习教师策略。最后,通过采用适当的"断奶"策略逐步减少教师干预,我们确保学生智能体在训练后期能够独立充分探索环境。在高速公路换道场景中的仿真实验表明,与基线算法相比,我们提出的框架不仅提高了学习效率、降低了训练成本,还显著增强了训练过程中的安全性。