This letter tackles a joint user scheduling, frequency resource allocation (USRA), multi-input-multi-output mode selection (MIMO MS) between single-user MIMO and multi-user (MU) MIMO, and MU-MIMO user selection problem, integrating uplink orthogonal frequency division multiple access (OFDMA) in IEEE 802.11ax. Specifically, we focus on \textit{unsaturated traffic conditions} where users' data demands fluctuate. In unsaturated traffic conditions, considering packet volumes per user introduces a combinatorial problem, requiring the simultaneous optimization of MU-MIMO user selection and RA along the time-frequency-space axis. Consequently, dealing with the combinatorial nature of this problem, characterized by a large cardinality of unknown variables, poses a challenge that conventional optimization methods find nearly impossible to address. In response, this letter proposes an approach with deep hierarchical reinforcement learning (DHRL) to solve the joint problem. Rather than simply adopting off-the-shelf DHRL, we \textit{tailor} the DHRL to the joint USRA and MS problem, thereby significantly improving the convergence speed and throughput. Extensive simulation results show that the proposed algorithm achieves significantly improved throughput compared to the existing schemes under various unsaturated traffic conditions.
翻译:本文解决了一种联合用户调度、频率资源分配(USRA)、单用户MIMO与多用户MIMO之间的多输入多输出模式选择(MIMO MS)以及MU-MIMO用户选择问题,该问题整合了IEEE 802.11ax中的上行正交频分多址(OFDMA)。具体而言,我们关注**非饱和流量条件**,其中用户的数据需求波动。在非饱和流量条件下,考虑每个用户的数据包量会引入一个组合问题,需要在时-频-空维度上同时优化MU-MIMO用户选择和资源分配。因此,处理这一问题的高度组合特性(表现为未知变量基数庞大)给传统优化方法带来了几乎无法解决的挑战。为此,本文提出了一种基于深度分层强化学习(DHRL)的方法来解决这一联合问题。我们并非简单采用现成的DHRL,而是**定制**DHRL以适应联合USRA和MS问题,从而显著提高了收敛速度和吞吐量。大量仿真结果表明,在多种非饱和流量条件下,所提算法相比现有方案实现了显著提升的吞吐量性能。