Channel Selection for Wi-Fi 7 Multi-Link Operation via Optimistic-Weighted VDN and Parallel Transfer Reinforcement Learning

Dense and unplanned IEEE 802.11 Wireless Fidelity(Wi-Fi) deployments and the continuous increase of throughput and latency stringent services for users have led to machine learning algorithms to be considered as promising techniques in the industry and the academia. Specifically, the ongoing IEEE 802.11be EHT -- Extremely High Throughput, known as Wi-Fi 7 -- amendment propose, for the first time, Multi-Link Operation (MLO). Among others, this new feature will increase the complexity of channel selection due the novel multiple interfaces proposal. In this paper, we present a Parallel Transfer Reinforcement Learning (PTRL)-based cooperative Multi-Agent Reinforcement Learning (MARL) algorithm named Parallel Transfer Reinforcement Learning Optimistic-Weighted Value Decomposition Networks (oVDN) to improve intelligent channel selection in IEEE 802.11be MLO-capable networks. Additionally, we compare the impact of different parallel transfer learning alternatives and a centralized non-transfer MARL baseline. Two PTRL methods are presented: Multi-Agent System (MAS) Joint Q-function Transfer, where the joint Q-function is transferred and MAS Best/Worst Experience Transfer where the best and worst experiences are transferred among MASs. Simulation results show that oVDNg -- only the best experiences are utilized -- is the best algorithm variant. Moreover, oVDNg offers a gain up to 3%, 7.2% and 11% when compared with VDN, VDN-nonQ and non-PTRL baselines. Furthermore, oVDNg experienced a reward convergence gain in the 5 GHz interface of 33.3% over oVDNb and oVDN where only worst and both types of experiences are considered, respectively. Finally, our best PTRL alternative showed an improvement over the non-PTRL baseline in terms of speed of convergence up to 40 episodes and reward up to 135%.

翻译：密集且非规划的IEEE 802.11无线保真（Wi-Fi）部署，以及用户对吞吐量持续增长和低延迟严格服务需求的提升，促使机器学习算法在工业界和学术界被视为具有前景的技术。具体而言，正在制定的IEEE 802.11be EHT（极高吞吐量，即Wi-Fi 7）修订案首次提出了多链路操作（MLO）。其中，这一新特性因引入多接口方案而增加了信道选择的复杂性。本文提出一种基于并行迁移强化学习（PTRL）的协作式多智能体强化学习（MARL）算法——名为并行迁移强化学习乐观加权值分解网络（oVDN），以改进具备MLO能力的IEEE 802.11be网络中的智能信道选择。此外，我们对比了不同并行迁移学习方案与集中式非迁移MARL基线的影响。本文提出两种PTRL方法：多智能体系统（MAS）联合Q函数迁移（即迁移联合Q函数）与MAS最佳/最差经验迁移（即在MAS间迁移最佳与最差经验）。仿真结果表明，仅利用最佳经验的oVDNg是最优算法变体。此外，与VDN、VDN-nonQ及非PTRL基线相比，oVDNg实现了最高3%、7.2%和11%的性能增益。同时，oVDNg在5 GHz接口上的奖励收敛增益较仅考虑最差经验或两种经验的oVDNb与oVDN分别提升了33.3%。最终，我们的最优PTRL方案在收敛速度（最多40个回合）与奖励（最高135%）方面均优于非PTRL基线。