Safety and scalability are two critical challenges faced by practical Multi-Agent Systems (MAS). However, existing Multi-Agent Reinforcement Learning (MARL) algorithms that rely solely on reward shaping are ineffective in ensuring safety, and their scalability is rather limited due to the fixed-size network output. To address these issues, we propose a novel framework, Scalable Safe MARL (SS-MARL), to enhance the safety and scalability of MARL methods. Leveraging the inherent graph structure of MAS, we design a multi-layer message passing network to aggregate local observations and communications of varying sizes. Furthermore, we develop a constrained joint policy optimization method in the setting of local observation to improve safety. Simulation experiments demonstrate that SS-MARL achieves a better trade-off between optimality and safety compared to baselines, and its scalability significantly outperforms the latest methods in scenarios with a large number of agents.
翻译:安全性与可扩展性是实际多智能体系统面临的两大关键挑战。然而,现有仅依赖奖励塑形的多智能体强化学习算法无法有效保障安全性,且因其固定尺寸的网络输出导致可扩展性相当有限。为解决这些问题,我们提出了一种新颖的框架——可扩展安全多智能体强化学习,以增强多智能体强化学习方法的安全性与可扩展性。利用多智能体系统固有的图结构,我们设计了一个多层消息传递网络来聚合不同规模的局部观测与通信信息。此外,我们在局部观测设定下开发了一种约束联合策略优化方法以提升安全性。仿真实验表明,相较于基线方法,SS-MARL在最优性与安全性之间取得了更好的平衡,且在大量智能体场景中其可扩展性显著优于最新方法。