The Open Radio Access Network (Open RAN) paradigm, and its reference architecture proposed by the O-RAN Alliance, is paving the way toward open, interoperable, observable and truly intelligent cellular networks. Crucial to this evolution is Machine Learning (ML), which will play a pivotal role by providing the necessary tools to realize the vision of self-organizing O-RAN systems. However, to be actionable, ML algorithms need to demonstrate high reliability, effectiveness in delivering high performance, and the ability to adapt to varying network conditions, traffic demands and performance requirements. To address these challenges, in this paper we propose a novel Deep Reinforcement Learning (DRL) agent design for O-RAN applications that can learn control policies under varying Service Level Agreement (SLAs) with heterogeneous minimum performance requirements. We focus on the case of RAN slicing and SLAs specifying maximum tolerable end-to-end latency levels. We use the OpenRAN Gym open-source environment to train a DRL agent that can adapt to varying SLAs and compare it against the state-of-the-art. We show that our agent maintains a low SLA violation rate that is 8.3x and 14.4x lower than approaches based on Deep Q- Learning (DQN) and Q-Learning while consuming respectively 0.3x and 0.6x fewer resources without the need for re-training.
翻译:开放无线接入网(Open RAN)范式及其由O-RAN联盟提出的参考架构,正为构建开放、互操作、可观测且真正智能的蜂窝网络铺平道路。机器学习作为这一演进过程中的关键推动力,将通过提供必要工具来实现自组织O-RAN系统的愿景。然而,机器学习算法若要具备可操作性,必须展现出高可靠性、实现高性能的有效性,以及适应动态网络条件、流量需求和性能要求的能力。针对这些挑战,本文提出一种面向O-RAN应用的新型深度强化学习智能体设计方案,该智能体能够在具有异构最低性能要求的动态服务水平协议下学习控制策略。我们重点研究RAN切片场景中指定最大可容忍端到端延迟水平的服务水平协议。通过使用OpenRAN Gym开源环境训练可适应动态服务水平协议的深度强化学习智能体,并将其与现有最优方法进行对比。实验表明,本智能体在保持较低服务水平协议违约率的同时,其违约率分别比基于深度Q网络和Q学习的方案低8.3倍和14.4倍,且资源消耗分别仅为后两者的0.3倍和0.6倍,且无需重新训练。