From Simulation to Reality: Practical Deep Reinforcement Learning-based Link Adaptation for Cellular Networks

Link Adaptation (LA) that dynamically adjusts the Modulation and Coding Schemes (MCS) to accommodate time-varying channels is crucial and challenging in cellular networks. Deep reinforcement learning (DRL)-based LA that learns to make decision through the interaction with the environment is a promising approach to improve throughput. However, existing DRL-based LA algorithms are typically evaluated in simplified simulation environments, neglecting practical issues such as ACK/NACK feedback delay, retransmission and parallel hybrid automatic repeat request (HARQ). Moreover, these algorithms overlook the impact of DRL execution latency, which can significantly degrade system performance. To address these challenges, we propose Decoupling-DQN (DC-DQN), a new DRL framework that separates traditional DRL's coupled training and inference processes into two modules based on Deep Q Networks (DQN): a real-time inference module and an out-of-decision-loop training module. Based on this framework, we introduce a novel DRL-based LA algorithm, DC-DQN-LA. The algorithm incorporates practical considerations by designing state, action, and reward functions that account for feedback delays, parallel HARQ, and retransmissions. We implemented a prototype using USRP software-defined radios and srsRAN software. Experimental results demonstrate that DC-DQN-LA improves throughput by 40\% to 70\% in mobile scenario compared with baseline LA algorithms, while maintaining comparable block error rates, and can quickly adapt to environment changes in mobile-to-static scenario. These results highlight the efficiency and practicality of the proposed DRL-based LA algorithm.

翻译：链路自适应（LA）通过动态调整调制与编码方案（MCS）以适应时变信道，是蜂窝网络中至关重要且具有挑战性的技术。基于深度强化学习（DRL）的LA方法通过与环境的交互学习决策机制，是提升吞吐量的有效途径。然而，现有基于DRL的LA算法通常在简化的仿真环境中进行评估，忽略了ACK/NACK反馈延迟、重传及并行混合自动重传请求（HARQ）等实际因素。此外，这些算法未考虑DRL执行延迟对系统性能的显著影响。为应对这些挑战，本文提出解耦深度Q网络（DC-DQN）——一种基于深度Q网络（DQN）的新型DRL框架，将传统DRL中耦合的训练与推理过程分离为实时推理模块和决策环外训练模块。基于该框架，我们进一步提出新型DRL-LA算法DC-DQN-LA。该算法通过设计包含反馈延迟、并行HARQ和重传机制的状态、动作与奖励函数，实现了对实际场景的兼容。我们使用USRP软件定义无线电和srsRAN软件构建了原型系统。实验结果表明：在移动场景中，DC-DQN-LA相较基线LA算法可提升40%至70%的吞吐量，同时保持相当的误块率；在移动-静态切换场景中能快速适应环境变化。这些结果验证了所提DRL-LA算法的高效性与实用性。