This paper investigates the use of multi-agent reinforcement learning (MARL) to address distributed channel access in wireless local area networks. In particular, we consider the challenging yet more practical case where the agents heterogeneously adopt value-based or policy-based reinforcement learning algorithms to train the model. We propose a heterogeneous MARL training framework, named QPMIX, which adopts a centralized training with distributed execution paradigm to enable heterogeneous agents to collaborate. Moreover, we theoretically prove the convergence of the proposed heterogeneous MARL method when using the linear value function approximation. Our method maximizes the network throughput and ensures fairness among stations, therefore, enhancing the overall network performance. Simulation results demonstrate that the proposed QPMIX algorithm improves throughput, mean delay, delay jitter, and collision rates compared with conventional carrier-sense multiple access with collision avoidance in the saturated traffic scenario. Furthermore, the QPMIX is shown to be robust in unsaturated and delay-sensitive traffic scenarios, and promotes cooperation among heterogeneous agents.
翻译:本文研究利用多智能体强化学习(MARL)解决无线局域网中的分布式信道接入问题。特别地,我们考虑了一个具有挑战性但更实际的场景,即智能体异构地采用基于值或基于策略的强化学习算法来训练模型。我们提出了一种名为QPMIX的异构MARL训练框架,该框架采用集中训练与分布式执行的范式,使异构智能体能够协同工作。此外,我们从理论上证明了所提出的异构MARL方法在使用线性值函数逼近时的收敛性。我们的方法最大化网络吞吐量并确保站点间的公平性,从而提升整体网络性能。仿真结果表明,在饱和流量场景下,与传统的带冲突避免的载波侦听多路访问相比,所提出的QPMIX算法在吞吐量、平均时延、时延抖动和冲突率方面均有改善。此外,QPMIX在非饱和流量场景和对时延敏感的流量场景中表现出鲁棒性,并能促进异构智能体之间的协作。