The Open Radio Access Network (O-RAN) architecture empowers intelligent and automated optimization of the RAN through applications deployed on the RAN Intelligent Controller (RIC) platform, enabling capabilities beyond what is achievable with traditional RAN solutions. Within this paradigm, Traffic Steering (TS) emerges as a pivotal RIC application that focuses on optimizing cell-level mobility settings in near-real-time, aiming to significantly improve network spectral efficiency. In this paper, we design a novel TS algorithm based on a Cascade Reinforcement Learning (CaRL) framework. We propose state space factorization and policy decomposition to reduce the need for large models and well-labeled datasets. For each sub-state space, an RL sub-policy will be trained to learn an optimized mapping onto the action space. To apply CaRL on new network regions, we propose a knowledge transfer approach to initialize a new sub-policy based on knowledge learned by the trained policies. To evaluate CaRL, we build a data-driven and scalable RIC digital twin (DT) that is modeled using important real-world data, including network configuration, user geo-distribution, and traffic demand, among others, from a tier-1 mobile operator in the US. We evaluate CaRL on two DT scenarios representing two network clusters in two different cities and compare its performance with the business-as-usual (BAU) policy and other competing optimization approaches using heuristic and Q-table algorithms. Benchmarking results show that CaRL performs the best and improves the average cluster-aggregated downlink throughput over the BAU policy by 24% and 18% in these two scenarios, respectively.
翻译:开放式无线接入网(O-RAN)架构通过部署在RAN智能控制器(RIC)平台上的应用,实现了RAN的智能化与自动化优化,其能力超越了传统RAN解决方案。在此范式下,流量引导(TS)成为一项关键的RIC应用,专注于在近实时时间尺度上优化小区级移动性设置,旨在显著提升网络频谱效率。本文设计了一种基于级联强化学习(CaRL)框架的新型TS算法。我们提出状态空间分解与策略分解,以降低对大型模型和高质量标注数据集的需求。针对每个子状态空间,将训练一个强化学习子策略,学习到动作空间的优化映射。为将CaRL应用于新的网络区域,我们提出了一种知识迁移方法,基于已训练策略习得的知识初始化新子策略。为评估CaRL,我们构建了一个数据驱动、可扩展的RIC数字孪生(DT),其建模数据源自美国一线移动运营商的关键真实数据,包括网络配置、用户地理分布、流量需求等。我们在两个分别代表不同城市网络集群的DT场景下评估CaRL,并将其性能与常规业务(BAU)策略以及采用启发式和Q表算法的其他竞争优化方法进行对比。基准测试结果表明,CaRL性能最优,在这两个场景中,相比于BAU策略,平均聚合下行链路吞吐量分别提升了24%和18%。