Deep Reinforcement Learning for Orchestrating Cost-Aware Reconfigurations of vRANs

Virtualized Radio Access Networks (vRANs) are fully configurable and can be implemented at a low cost over commodity platforms to enable network management flexibility. In this paper, a novel vRAN reconfiguration problem is formulated to jointly reconfigure the functional splits of the base stations (BSs), locations of the virtualized central units (vCUs) and distributed units (vDUs), their resources, and the routing for each BS data flow. The objective is to minimize the long-term total network operation cost while adapting to the varying traffic demands and resource availability. Testbed measurements are performed to study the relationship between the traffic demands and computing resources, which reveals high variance and depends on the platform and its load. Consequently, finding the perfect model of the underlying system is non-trivial. Therefore, to solve the proposed problem, a deep reinforcement learning (RL)-based framework is proposed and developed using model-free RL approaches. Moreover, the problem consists of multiple BSs sharing the same resources, which results in a multi-dimensional discrete action space and leads to a combinatorial number of possible actions. To overcome this curse of dimensionality, action branching architecture, which is an action decomposition method with a shared decision module followed by neural network is combined with Dueling Double Deep Q-network (D3QN) algorithm. Simulations are carried out using an O-RAN compliant model and real traces of the testbed. Our numerical results show that the proposed framework successfully learns the optimal policy that adaptively selects the vRAN configurations, where its learning convergence can be further expedited through transfer learning even in different vRAN systems. It offers significant cost savings by up to 59\% of a static benchmark, 35\% of DDPG with discretization, and 76\% of non-branching D3QN.

翻译：虚拟化无线接入网（vRAN）具有完全可配置性，可在商用平台上以低成本实现，从而提供网络管理灵活性。本文提出了一种新型vRAN重构问题，旨在联合重构基站（BS）功能切分、虚拟化中央单元（vCU）与分布式单元（vDU）的位置分配、资源分配以及各基站数据流的路由选择。目标是在适应动态流量需求与资源可用性的同时，最小化长期网络运营总成本。通过实验平台测量分析了流量需求与计算资源之间的关系，结果表明两者存在高方差，且依赖于平台及其负载。因此，建立底层系统的精确模型具有显著挑战性。为此，本文提出并开发了一种基于深度强化学习（RL）的框架，采用无模型RL方法求解该问题。此外，该问题涉及多个基站共享相同资源，导致多维度离散动作空间及组合爆炸式的动作数量。为克服维度灾难问题，将动作分支架构（一种采用共享决策模块与神经网络的动作分解方法）与决斗双重深度Q网络（D3QN）算法相结合。基于O-RAN标准模型与实验平台真实流量记录进行仿真实验。数值结果表明，该框架能够成功学习自适应选择vRAN配置的最优策略，且通过迁移学习可进一步加快学习收敛速度，即使在不同vRAN系统中同样有效。相比静态基准方案可节省高达59%的成本，相比离散化处理的DDPG算法节省35%，相比非分支D3QN算法节省76%。