Multi-Agent Reinforcement Learning (MARL) has become a classic paradigm to solve diverse, intelligent control tasks like autonomous driving in Internet of Vehicles (IoV). However, the widely assumed existence of a central node to implement centralized federated learning-assisted MARL might be impractical in highly dynamic scenarios, and the excessive communication overheads possibly overwhelm the IoV system. Therefore, in this paper, we design a communication efficient cooperative MARL algorithm, named RSM-MAPPO, to reduce the communication overheads in a fully distributed architecture. In particular, RSM-MAPPO enhances the multi-agent Proximal Policy Optimization (PPO) by incorporating the idea of segment mixture and augmenting multiple model replicas from received neighboring policy segments. Afterwards, RSM-MAPPO adopts a theory-guided metric to regulate the selection of contributive replicas to guarantee the policy improvement. Finally, extensive simulations in a mixed-autonomy traffic control scenario verify the effectiveness of the RSM-MAPPO algorithm.
翻译:多智能体强化学习(MARL)已成为解决车联网(IoV)中如自动驾驶等多样化智能控制任务的经典范式。然而,在高度动态场景下,广泛假设存在中央节点以实现集中式联邦学习辅助的MARL可能不切实际,且过度的通信开销可能使IoV系统不堪重负。因此,本文设计了一种通信高效的合作式MARL算法,命名为RSM-MAPPO,以在全分布式架构中降低通信开销。具体而言,RSM-MAPPO通过引入分段混合思想并从接收的相邻策略分段中增强多个模型副本,改进了多智能体近端策略优化(PPO)。随后,RSM-MAPPO采用理论引导的度量标准来调控贡献性副本的选择,以保证策略改进。最后,在混合自主交通控制场景中的大量仿真验证了RSM-MAPPO算法的有效性。