Dynamic radio resource management (RRM) in wireless networks presents significant challenges, particularly in the context of Radio Access Network (RAN) slicing. This technology, crucial for catering to varying user requirements, often grapples with complex optimization scenarios. Existing Reinforcement Learning (RL) approaches, while achieving good performance in RAN slicing, typically rely on online algorithms or behavior cloning. These methods necessitate either continuous environmental interactions or access to high-quality datasets, hindering their practical deployment. Towards addressing these limitations, this paper introduces offline RL to solving the RAN slicing problem, marking a significant shift towards more feasible and adaptive RRM methods. We demonstrate how offline RL can effectively learn near-optimal policies from sub-optimal datasets, a notable advancement over existing practices. Our research highlights the inherent flexibility of offline RL, showcasing its ability to adjust policy criteria without the need for additional environmental interactions. Furthermore, we present empirical evidence of the efficacy of offline RL in adapting to various service-level requirements, illustrating its potential in diverse RAN slicing scenarios.
翻译:无线网络中的动态无线资源管理(RRM)面临重大挑战,尤其在无线接入网(RAN)切片场景中。该技术对满足多样化用户需求至关重要,却常受困于复杂的优化问题。现有强化学习(RL)方法虽在RAN切片中表现优异,但主要依赖在线算法或行为克隆技术。这些方法要么需要持续的环境交互,要么依赖高质量数据集,严重制约了实际部署。针对这些局限,本文首次将离线强化学习引入RAN切片问题,标志着向更可行、更具适应性的RRM方法的重大转变。我们证明离线强化学习能够从次优数据集中有效学习近似最优策略,这是对现有实践的重要突破。研究揭示了离线强化学习的内在灵活性,展示了其在无需额外环境交互的情况下调整策略标准的能力。此外,我们提供了离线强化学习适应不同服务等级需求的实证证据,证明了其在多样化RAN切片场景中的应用潜力。