The highly heterogeneous ecosystem of Next Generation (NextG) wireless communication systems calls for novel networking paradigms where functionalities and operations can be dynamically and optimally reconfigured in real time to adapt to changing traffic conditions and satisfy stringent and diverse Quality of Service (QoS) demands. Open Radio Access Network (RAN) technologies, and specifically those being standardized by the O-RAN Alliance, make it possible to integrate network intelligence into the once monolithic RAN via intelligent applications, namely, xApps and rApps. These applications enable flexible control of the network resources and functionalities, network management, and orchestration through data-driven control loops. Despite recent work demonstrating the effectiveness of Deep Reinforcement Learning (DRL) in controlling O-RAN systems, how to design these solutions in a way that does not create conflicts and unfair resource allocation policies is still an open challenge. In this paper, we perform a comparative analysis where we dissect the impact of different DRL-based xApp designs on network performance. Specifically, we benchmark 12 different xApps that embed DRL agents trained using different reward functions, with different action spaces and with the ability to hierarchically control different network parameters. We prototype and evaluate these xApps on Colosseum, the world's largest O-RAN-compliant wireless network emulator with hardware-in-the-loop. We share the lessons learned and discuss our experimental results, which demonstrate how certain design choices deliver the highest performance while others might result in a competitive behavior between different classes of traffic with similar objectives.
翻译:下一代无线通信系统的高度异构生态系统要求采用新型网络范式,使功能和操作能够实时动态优化重构,以适应变化的流量状况并满足严苛多样的服务质量需求。开放式无线接入网技术(特别是O-RAN联盟正在标准化的技术)通过智能应用(即xApp和rApp)将网络智能集成到曾经单一封闭的RAN中。这些应用通过数据驱动的控制环路实现网络资源与功能的灵活控制、网络管理及编排。尽管近期研究已证明深度强化学习在控制O-RAN系统中的有效性,但如何设计这些解决方案以避免冲突和不公平的资源分配策略仍是开放性挑战。本文通过比较分析,剖析不同DRL-based xApp设计对网络性能的影响。具体而言,我们对12种嵌入DRL智能体的xApp进行基准测试,这些智能体采用不同奖励函数训练、具有不同动作空间,并能分层控制不同网络参数。我们在全球最大的支持O-RAN且包含硬件在环的无线网络仿真器Colosseum上对上述xApp进行原型实现与评估。我们分享经验教训并讨论实验结果,这些结果揭示了某些设计选择如何实现最高性能,而另一些设计如何导致具有相似目标的不同流量类别之间产生竞争行为。