With wireless devices increasingly forming a unified smart network for seamless, user-friendly operations, random access (RA) medium access control (MAC) design is considered a key solution for handling unpredictable data traffic from multiple terminals. However, it remains challenging to design an effective RA-based MAC protocol to minimize collisions and ensure transmission fairness across the devices. While existing multi-agent reinforcement learning (MARL) approaches with centralized training and decentralized execution (CTDE) have been proposed to optimize RA performance, their reliance on centralized training and the significant overhead required for information collection can make real-world applications unrealistic. In this work, we adopt a fully decentralized MARL architecture, where policy learning does not rely on centralized tasks but leverages consensus-based information exchanges across devices. We design our MARL algorithm over an actor-critic (AC) network and propose exchanging only local rewards to minimize communication overhead. Furthermore, we provide a theoretical proof of global convergence for our approach. Numerical experiments show that our proposed MARL algorithm can significantly improve RA network performance compared to other baselines.
翻译:随着无线设备日益形成一个统一的智能网络以实现无缝、用户友好的操作,随机接入(RA)媒体接入控制(MAC)设计被认为是处理来自多个终端的不可预测数据流量的关键解决方案。然而,设计一种有效的基于RA的MAC协议以最小化冲突并确保设备间的传输公平性仍然具有挑战性。虽然现有采用集中式训练与分散式执行(CTDE)的多智能体强化学习(MARL)方法已被提出来优化RA性能,但它们对集中式训练的依赖以及信息收集所需的大量开销可能使其在实际应用中不切实际。在本工作中,我们采用完全去中心化的MARL架构,其中策略学习不依赖于集中式任务,而是利用设备间基于共识的信息交换。我们在行动者-评论家(AC)网络上设计MARL算法,并提出仅交换局部奖励以最小化通信开销。此外,我们为所提方法提供了全局收敛性的理论证明。数值实验表明,与其他基线方法相比,我们提出的MARL算法能显著提升RA网络性能。