We present a Reinforcement Learning (RL) approach to the problem of controlling the Discontinuous Reception (DRX) policy from a Base Transceiver Station (BTS) in a cellular network. We do so by means of optimally timing the transmission of fast Layer-2 signaling messages (a.k.a. Medium Access Layer (MAC) Control Elements (CEs) as specified in 5G New Radio). Unlike more conventional approaches to DRX optimization, which rely on fine-tuning the values of DRX timers, we assess the gains that can be obtained solely by means of this MAC CE signalling. For the simulation part, we concentrate on traffic types typically encountered in Extended Reality (XR) applications, where the need for battery drain minimization and overheating mitigation are particularly pressing. Both 3GPP 5G New Radio (5G NR) compliant and non-compliant ("beyond 5G") MAC CEs are considered. Our simulation results show that our proposed technique strikes an improved trade-off between latency and energy savings as compared to conventional timer-based approaches that are characteristic of most current implementations. Specifically, our RL-based policy can nearly halve the active time for a single User Equipment (UE) with respect to a na\"ive MAC CE transmission policy, and still achieve near 20% active time reduction for 9 simultaneously served UEs.
翻译:本文提出一种强化学习方法,用于控制蜂窝网络中基站收发信机的非连续接收策略。该方法通过优化快速层二信令消息(即5G新空口规范中的媒体接入层控制单元)的传输时机来实现。与依赖精细调整DRX定时器参数的传统优化方法不同,我们评估了仅通过MAC控制单元信令所能获得的性能增益。在仿真部分,我们重点关注扩展现实应用中常见的流量类型,这类应用对降低电池消耗和缓解过热问题的需求尤为迫切。研究同时考虑了符合3GPP 5G新空口规范与超越5G标准的非标准MAC控制单元。仿真结果表明,相较于当前主流实现所采用的基于定时器的传统方法,我们提出的技术能够在时延与节能之间实现更优的权衡。具体而言,基于强化学习的策略可使单用户设备的激活时间相比简单的MAC控制单元传输策略减少近半,在同时服务9个用户设备时仍能实现近20%的激活时间降低。