Sensing and communication technologies have enhanced learning-based decision making methodologies for multi-agent systems such as connected autonomous vehicles (CAV). However, most existing safe reinforcement learning based methods assume accurate state information. It remains challenging to achieve safety requirement under state uncertainties for CAVs, considering the noisy sensor measurements and the vulnerability of communication channels. In this work, we propose a Robust Multi-Agent Proximal Policy Optimization with robust Safety Shield (SR-MAPPO) for CAVs in various driving scenarios. Both robust MARL algorithm and control barrier function (CBF)-based safety shield are used in our approach to cope with the perturbed or uncertain state inputs. The robust policy is trained with a worst-case Q function regularization module that pursues higher lower-bounded reward in the former, whereas the latter, i.e., the robust CBF safety shield accounts for CAVs' collision-free constraints in complicated driving scenarios with even perturbed vehicle state information. We validate the advantages of SR-MAPPO in robustness and safety and compare it with baselines under different driving and state perturbation scenarios in CARLA simulator. The SR-MAPPO policy is verified to maintain higher safety rates and efficiency (reward) when threatened by both state perturbations and unconnected vehicles' dangerous behaviors.
翻译:感知与通信技术提升了网联自动驾驶车辆等多元人系统的基于学习的决策方法。然而,现有大多数基于安全强化学习的方法均假设精准的状态信息。面对嘈杂的传感器测量值与脆弱的通信信道,如何在状态不确定性下满足网联自动驾驶车辆的安全需求仍具挑战性。本研究针对不同驾驶场景中的网联自动驾驶车辆,提出一种集成鲁棒安全防护的鲁棒多智能体近端策略优化方法(SR-MAPPO)。该方法融合了鲁棒多智能体强化学习算法与基于控制障碍函数的防护机制,以应对扰动或不确定的状态输入。其中,鲁棒策略通过最坏情况Q函数正则化模块训练,旨在获取更高的下界奖励;而鲁棒控制障碍函数安全防护模块则在存在扰动的车辆状态信息复杂驾驶场景中,确保网联自动驾驶车辆的避碰约束。我们在CARLA仿真器中的不同驾驶与状态扰动场景下,验证了SR-MAPPO在鲁棒性与安全性方面的优势,并将其与基线方法进行对比。结果表明,当面临状态扰动与非网联车辆危险行为威胁时,SR-MAPPO策略能够维持更高的安全率与效率(奖励值)。