This letter tackles a joint user scheduling, frequency resource allocation (USRA), multi-input-multi-output mode selection (MIMO MS) between single-user MIMO and multi-user (MU) MIMO, and MU-MIMO user selection problem, integrating uplink orthogonal frequency division multiple access (OFDMA) in IEEE 802.11ax. Specifically, we focus on \textit{unsaturated traffic conditions} where users' data demands fluctuate. In unsaturated traffic conditions, considering packet volumes per user introduces a combinatorial problem, requiring the simultaneous optimization of MU-MIMO user selection and RA along the time-frequency-space axis. Consequently, dealing with the combinatorial nature of this problem, characterized by a large cardinality of unknown variables, poses a challenge that conventional optimization methods find nearly impossible to address. In response, this letter proposes an approach with deep hierarchical reinforcement learning (DHRL) to solve the joint problem. Rather than simply adopting off-the-shelf DHRL, we \textit{tailor} the DHRL to the joint USRA and MS problem, thereby significantly improving the convergence speed and throughput. Extensive simulation results show that the proposed algorithm achieves significantly improved throughput compared to the existing schemes under various unsaturated traffic conditions.
翻译:本文针对IEEE 802.11ax中集成上行正交频分多址(OFDMA)的联合用户调度、频率资源分配(USRA)、单用户MIMO与多用户(MU)MIMO之间的多输入多输出模式选择(MIMO MS)以及MU-MIMO用户选择问题展开研究。具体而言,我们聚焦于用户数据需求波动的\textit{非饱和流量条件}。在非饱和流量条件下,考虑每用户的数据包数量会引入组合优化问题,需要沿时-频-空维度同时优化MU-MIMO用户选择与资源分配。因此,处理该具有大量未知变量特征组合优化问题的挑战性,使得传统优化方法几乎无法应对。为此,本文提出基于深度分层强化学习(DHRL)的方法来解决该联合问题。我们并非简单采用现成DHRL,而是将DHRL\textit{定制化}应用于联合USRA与MS问题,从而显著提升收敛速度与吞吐量。大量仿真结果表明,在多种非饱和流量条件下,所提算法相比现有方案实现了显著提升的吞吐量性能。