In this paper, we explore various multi-agent reinforcement learning (MARL) techniques to design grant-free random access (RA) schemes for low-complexity, low-power battery operated devices in massive machine-type communication (mMTC) wireless networks. We use value decomposition networks (VDN) and QMIX algorithms with parameter sharing (PS) with centralized training and decentralized execution (CTDE) while maintaining scalability. We then compare the policies learned by VDN, QMIX, and deep recurrent Q-network (DRQN) and explore the impact of including the agent identifiers in the observation vector. We show that the MARL-based RA schemes can achieve a better throughput-fairness trade-off between agents without having to condition on the agent identifiers. We also present a novel correlated traffic model, which is more descriptive of mMTC scenarios, and show that the proposed algorithm can easily adapt to traffic non-stationarities
翻译:本文探索了多种多智能体强化学习(MARL)技术,旨在为海量机器类通信(mMTC)无线网络中低复杂度、低功耗电池供电设备设计免授权随机接入(RA)方案。我们采用值分解网络(VDN)和QMIX算法,结合参数共享(PS)与集中式训练分布式执行(CTDE)架构,同时保持可扩展性。随后比较了VDN、QMIX和深度循环Q网络(DRQN)习得的策略,并探讨了在观测向量中包含智能体标识符的影响。研究表明,基于MARL的RA方案能够在不依赖智能体标识符的条件下,实现智能体间更好的吞吐量-公平性权衡。此外,我们提出了一种更能描述mMTC场景特征的新型相关流量模型,并证明所提算法能够轻松适应流量的非平稳性。