The highly heterogeneous ecosystem of NextG wireless communication systems calls for novel networking paradigms where functionalities and operations can be dynamically and optimally reconfigured in real time to adapt to changing traffic conditions and satisfy stringent and diverse QoS demands. Open RAN technologies, and specifically those being standardized by the O-RAN Alliance, make it possible to integrate network intelligence into the once monolithic RAN via intelligent applications, namely, xApps and rApps. These applications enable flexible control of the network resources and functionalities, network management, and orchestration through data-driven intelligent control loops. Recent work has showed how DRL is effective in dynamically controlling O-RAN systems. However, how to design these solutions in a way that manages heterogeneous optimization goals and prevents unfair resource allocation is still an open challenge, with the logic within DRL agents often considered as a black box. In this paper, we introduce PandORA, a framework to automatically design and train DRL agents for Open RAN applications, package them as xApps and evaluate them in the Colosseum wireless network emulator. We benchmark $23$ xApps that embed DRL agents trained using different architectures, reward design, action spaces, and decision-making timescales, and with the ability to hierarchically control different network parameters. We test these agents on the Colosseum testbed under diverse traffic and channel conditions, in static and mobile setups. Our experimental results indicate how suitable fine-tuning of the RAN control timers, as well as proper selection of reward designs and DRL architectures can boost network performance according to the network conditions and demand. Notably, finer decision-making granularities can improve mMTC's performance by ~56% and even increase eMBB Throughput by ~99%.
翻译:下一代无线通信系统的高度异构生态系统要求采用新型网络范式,其中功能和操作能够实时动态优化重配置,以适应不断变化的流量条件并满足严格多样的服务质量需求。开放无线接入网技术,特别是由O-RAN联盟标准化的技术,使得通过智能应用(即xApps和rApps)将网络智能集成到曾经单一化的无线接入网成为可能。这些应用通过数据驱动的智能控制环路,实现对网络资源与功能、网络管理与编排的灵活控制。近期研究表明,深度强化学习在动态控制O-RAN系统方面具有显著成效。然而,如何设计这些解决方案以管理异构优化目标并防止不公平资源分配,仍是一个开放挑战——深度强化学习智能体内部的逻辑常被视为黑箱。本文提出PandORA框架,用于自动化设计与训练面向开放无线接入网应用的深度强化学习智能体,将其封装为xApps并在Colosseum无线网络仿真器中评估性能。我们系统测试了23个嵌入深度强化学习智能体的xApps,这些智能体采用不同架构、奖励设计、动作空间和决策时间尺度进行训练,并具备分层控制不同网络参数的能力。在Colosseum测试平台上,我们在静态与移动场景下,针对多样化流量与信道条件对这些智能体进行了测试。实验结果表明:根据网络条件与需求,对无线接入网控制定时器进行适当微调,结合合理的奖励设计与深度强化学习架构选择,可显著提升网络性能。值得注意的是,更精细的决策粒度能使大规模机器类通信性能提升约56%,甚至增强型移动宽带吞吐量提高约99%。