This paper proposes a fully scalable multi-agent reinforcement learning (MARL) approach for packet scheduling in conflict graphs, aiming to minimizing average packet delays. Each agent autonomously manages the schedule of a single link over one or multiple sub-bands, considering its own state and states of conflicting links. The problem can be conceptualized as a decentralized partially observable Markov decision process (Dec-POMDP). The proposed solution leverages an on-policy reinforcement learning algorithms multi-agent proximal policy optimization (MAPPO) within a multi-agent networked system, incorporating advanced recurrent structures in the neural network. The MARL design allows for fully decentralized training and execution, seamlessly scaling to very large networks. Extensive simulations across a diverse range of conflict graphs demonstrate that the proposed solution compares favorably to well-established schedulers in terms of both throughput and delay under various traffic conditions.
翻译:本文提出一种完全可扩展的多智能体强化学习(MARL)方法,用于冲突图中的数据包调度,旨在最小化平均数据包延迟。每个智能体自主管理单个链路在一个或多个子带上的调度,同时考虑自身状态及冲突链路的状态。该问题可被概念化为一种去中心化部分可观测马尔可夫决策过程(Dec-POMDP)。所提出的解决方案利用在线策略强化学习算法——多智能体近端策略优化(MAPPO),并在多智能体网络系统中融入神经网络的高级循环结构。该MARL设计支持完全去中心化的训练与执行,可无缝扩展至超大规模网络。在多种冲突图上开展的大量仿真表明,所提方案在不同流量条件下,其吞吐量与延迟性能均优于成熟的调度器。