CaRL: Cascade Reinforcement Learning with State Space Splitting for O-RAN based Traffic Steering

The Open Radio Access Network (O-RAN) architecture empowers intelligent and automated optimization of the RAN through applications deployed on the RAN Intelligent Controller (RIC) platform, enabling capabilities beyond what is achievable with traditional RAN solutions. Within this paradigm, Traffic Steering (TS) emerges as a pivotal RIC application that focuses on optimizing cell-level mobility settings in near-real-time, aiming to significantly improve network spectral efficiency. In this paper, we design a novel TS algorithm based on a Cascade Reinforcement Learning (CaRL) framework. We propose state space factorization and policy decomposition to reduce the need for large models and well-labeled datasets. For each sub-state space, an RL sub-policy will be trained to learn an optimized mapping onto the action space. To apply CaRL on new network regions, we propose a knowledge transfer approach to initialize a new sub-policy based on knowledge learned by the trained policies. To evaluate CaRL, we build a data-driven and scalable RIC digital twin (DT) that is modeled using important real-world data, including network configuration, user geo-distribution, and traffic demand, among others, from a tier-1 mobile operator in the US. We evaluate CaRL on two DT scenarios representing two network clusters in two different cities and compare its performance with the business-as-usual (BAU) policy and other competing optimization approaches using heuristic and Q-table algorithms. Benchmarking results show that CaRL performs the best and improves the average cluster-aggregated downlink throughput over the BAU policy by 24% and 18% in these two scenarios, respectively.

翻译：开放式无线接入网（O-RAN）架构通过在无线接入网智能控制器（RIC）平台上部署应用程序，实现了对无线接入网的智能化与自动化优化，从而提供了超越传统无线接入网解决方案的能力。在此范式中，流量引导（TS）作为一种关键的RIC应用应运而生，其专注于近实时地优化小区级移动性设置，旨在显著提升网络频谱效率。本文设计了一种基于级联强化学习（CaRL）框架的新型TS算法。我们提出了状态空间分解与策略分解，以减少对大模型和高质量标注数据集的依赖。针对每个子状态空间，将训练一个强化学习子策略以学习到动作空间的优化映射。为了将CaRL应用于新的网络区域，我们提出了一种知识迁移方法，基于已训练策略习得的知识来初始化新的子策略。为评估CaRL，我们构建了一个数据驱动且可扩展的RIC数字孪生（DT）模型，该模型使用了来自美国一家一级移动运营商的重要真实数据，包括网络配置、用户地理分布和流量需求等。我们在代表两个不同城市中网络集群的两个DT场景下评估CaRL，并将其性能与常规业务（BAU）策略以及其他采用启发式和Q表算法的竞争性优化方法进行比较。基准测试结果表明，CaRL在两个场景中均表现最佳，相较于BAU策略，其平均集群聚合下行吞吐量分别提升了24%和18%。