Robust and Safe Multi-Agent Reinforcement Learning with Communication for Autonomous Vehicles: From Simulation to Hardware

Deep multi-agent reinforcement learning (MARL) has been demonstrated effectively in simulations for multi-robot problems. For autonomous vehicles, the development of vehicle-to-vehicle (V2V) communication technologies provide opportunities to further enhance system safety. However, zero-shot transfer of simulator-trained MARL policies to dynamic hardware systems remains challenging, and how to leverage communication and shared information for MARL has limited demonstrations on hardware. This problem is challenged by discrepancies between simulated and physical states, system state and model uncertainties, practical shared information design, and the need for safety guarantees in both simulation and hardware. This paper designs RSR-RSMARL, a novel Robust and Safe MARL framework that supports Real-Sim-Real (RSR) policy adaptation for multi-agent systems with communication among agents, with both simulation and hardware demonstrations. RSR-RSMARL leverages state (includes shared state information among agents) and action representations considering real system complexities for MARL formulation. The MARL policy is trained with robust MARL algorithm to enable zero-shot transfer to hardware considering the sim-to-real gap. A safety shield module using Control Barrier Functions (CBFs) provides safety guarantee for each individual agent. Experimental results on 1/10th-scale autonomous vehicles with V2V communication demonstrate the ability of RSR-RSMARL framework to enhance driving safety and coordination across multiple configurations. These findings emphasize the importance of jointly designing robust policy representations and modular safety architectures to enable scalable, generalizable RSR transfer in multi-agent autonomy.

翻译：深度多智能体强化学习（MARL）在多机器人问题的仿真中已被证明是有效的。对于自动驾驶车辆而言，车对车（V2V）通信技术的发展为提升系统安全性提供了新的机遇。然而，将仿真训练的MARL策略零样本迁移到动态硬件系统仍然面临挑战，并且如何利用通信和共享信息进行MARL在硬件上的演示有限。该问题受到仿真与物理状态差异、系统状态与模型不确定性、实际共享信息设计需求，以及在仿真和硬件中均需安全保证的挑战。本文设计了RSR-RSMARL，一种新颖的稳健安全MARL框架，支持具有智能体间通信的多智能体系统进行实-仿-实（RSR）策略适应，并提供了仿真和硬件演示。RSR-RSMARL利用考虑真实系统复杂性的状态（包含智能体间共享状态信息）和动作表征进行MARL建模。MARL策略通过稳健MARL算法训练，以考虑仿真到现实的差距，实现向硬件的零样本迁移。一个使用控制屏障函数（CBFs）的安全防护模块为每个独立智能体提供安全保障。在具有V2V通信的1/10缩比自动驾驶车辆上的实验结果表明，RSR-RSMARL框架能够提升多种配置下的驾驶安全性与协调性。这些发现强调了联合设计稳健的策略表征和模块化安全架构对于实现多智能体自主系统中可扩展、可泛化的RSR迁移的重要性。