Multi-Agent Reinforcement Learning (MARL) has emerged as a foundational approach for addressing diverse, intelligent control tasks, notably in autonomous driving within the Internet of Vehicles (IoV) domain. However, the widely assumed existence of a central node for centralized, federated learning-assisted MARL might be impractical in highly dynamic environments. This can lead to excessive communication overhead, potentially overwhelming the IoV system. To address these challenges, we design a novel communication-efficient and policy collaboration algorithm for MARL under the frameworks of Soft Actor-Critic (SAC) and Decentralized Federated Learning (DFL), named RSM-MASAC, within a fully distributed architecture. In particular, RSM-MASAC enhances multi-agent collaboration and prioritizes higher communication efficiency in dynamic IoV system by incorporating the concept of segmented aggregation in DFL and augmenting multiple model replicas from received neighboring policy segments, which are subsequently employed as reconstructed referential policies for mixing. Distinctively diverging from traditional RL approaches, with derived new bounds under Maximum Entropy Reinforcement Learning (MERL), RSM-MASAC adopts a theory-guided mixture metric to regulate the selection of contributive referential policies to guarantee the soft policy improvement during communication phase. Finally, the extensive simulations in mixed-autonomy traffic control scenarios verify the effectiveness and superiority of our algorithm.
翻译:多智能体强化学习已成为解决各类智能控制任务的基础性方法,在车联网领域的自动驾驶中尤为突出。然而,在高度动态的环境中,广泛假设的集中式联邦学习辅助多智能体强化学习所需中心节点的存在可能并不现实。这会导致过度的通信开销,可能使车联网系统不堪重负。为应对这些挑战,我们在完全分布式架构下,基于软演员-评论家与去中心化联邦学习框架,设计了一种新颖的通信高效多智能体策略协作算法——RSM-MASAC。该算法通过引入去中心化联邦学习中的分段聚合机制,并利用接收到的相邻策略片段增强多个模型副本(随后作为重构参考策略用于混合),显著提升了多智能体协作能力,并在动态车联网系统中实现了更高的通信效率。与传统强化学习方法不同,RSM-MASAC在最大熵强化学习框架下推导出新边界,采用理论指导的混合度量来调控贡献性参考策略的选择,从而保证通信阶段的软策略改进。最后,在混合自主交通控制场景中的大量仿真验证了本算法的有效性与优越性。