Multi-Agent Reinforcement Learning (MARL) has emerged as a foundational approach for addressing diverse, intelligent control tasks, notably in autonomous driving within the Internet of Vehicles (IoV) domain. However, the widely assumed existence of a central node for centralized, federated learning-assisted MARL might be impractical in highly dynamic environments. This can lead to excessive communication overhead, potentially overwhelming the IoV system. To address these challenges, we design a novel communication-efficient and policy collaboration algorithm for MARL under the frameworks of Soft Actor-Critic (SAC) and Decentralized Federated Learning (DFL), named RSM-MASAC, within a fully distributed architecture. In particular, RSM-MASAC enhances multi-agent collaboration and prioritizes higher communication efficiency in dynamic IoV system by incorporating the concept of segmented aggregation in DFL and augmenting multiple model replicas from received neighboring policy segments, which are subsequently employed as reconstructed referential policies for mixing. Distinctively diverging from traditional RL approaches, with derived new bounds under Maximum Entropy Reinforcement Learning (MERL), RSM-MASAC adopts a theory-guided mixture metric to regulate the selection of contributive referential policies to guarantee the soft policy improvement during communication phase. Finally, the extensive simulations in mixed-autonomy traffic control scenarios verify the effectiveness and superiority of our algorithm.
翻译:多智能体强化学习(MARL)已成为解决多样化智能控制任务的基础方法,尤其在车联网(IoV)领域的自动驾驶中表现突出。然而,在高度动态环境中,广泛假设存在的用于集中式联邦学习辅助MARL的中心节点可能并不实际,这会导致过高的通信开销,甚至可能使车联网系统不堪重负。为应对这些挑战,我们设计了一种基于软演员-评论家(SAC)与去中心化联邦学习(DFL)框架的新型通信高效策略协作算法——RSM-MASAC,该算法采用全分布式架构。具体而言,RSM-MASAC通过将DFL中的分段聚合概念引入系统,并利用从接收的邻近策略片段中增强多个模型副本(这些副本随后被用作重构的参考策略进行混合),从而提升多智能体间的协作能力,并优先保障动态车联网系统中的通信效率。与传统强化学习方法截然不同,RSM-MASAC基于最大熵强化学习(MERL)推导出新的约束边界,采用理论指导的混合度量来规范贡献性参考策略的选择,以确保通信阶段中的软策略改进。最后,在混合自主交通控制场景中的广泛仿真验证了我们算法的有效性与优越性。