The fifth generation (5G) of wireless networks must simultaneously support heterogeneous service categories, including Ultra-Reliable Low-Latency Communications (URLLC), enhanced Mobile Broadband (eMBB), and massive Machine-Type Communications (mMTC), each with distinct Quality of Service (QoS) requirements. Meeting these demands under limited spectrum resources requires adaptive and standards-compliant radio resource management. We present DORA (Dynamic O-RAN Resource Allocation), a deep reinforcement learning (DRL) framework for dynamic slice-level Physical Resource Block (PRB) allocation in Open RAN. DORA employs a PPO-based RL agent to allocate PRBs across URLLC, eMBB, and mMTC slices based on observed traffic demands and channel conditions. Intra-slice PRB scheduling is handled deterministically via round-robin among active UEs, simplifying control complexity and improving training stability. Unlike prior work, DORA supports online training and adapts continuously to evolving traffic patterns and cross-slice contention. Implemented in the standards-compliant OpenAirInterface (OAI) RAN stack and designed for deployment as an O-RAN xApp, DORA integrates seamlessly with RAN Intelligent Controllers (RICs). Extensive evaluation under congested regimes shows that DORA outperforms three non-learning baselines and a \texttt{DQN} agent, achieving lower URLLC latency, higher eMBB throughput with fewer SLA violations, and broader mMTC coverage without starving high-priority slices. To our knowledge, this is the first fully online DRL framework for adaptive, slice-aware PRB allocation in O-RAN.
翻译:第五代(5G)无线网络需同时支持异构服务类别,包括超可靠低延迟通信(URLLC)、增强型移动宽带(eMBB)和海量机器类通信(mMTC),各类服务均具有差异化的服务质量(QoS)要求。在有限频谱资源下满足这些需求,需要采用自适应且符合标准的无线资源管理方案。本文提出DORA(动态O-RAN资源分配),这是一种用于开放无线接入网中切片级物理资源块(PRB)动态分配的深度强化学习(DRL)框架。DORA采用基于PPO的强化学习智能体,根据观测到的业务需求与信道条件,在URLLC、eMBB和mMTC切片间分配PRB。切片内PRB调度通过活跃用户设备间的轮询机制确定性执行,从而简化控制复杂度并提升训练稳定性。与现有研究不同,DORA支持在线训练,并能持续适应动态变化的业务模式及切片间竞争。该框架在符合标准的OpenAirInterface(OAI)无线接入网协议栈中实现,设计为O-RAN xApp部署,可与无线接入网智能控制器(RIC)无缝集成。在拥塞场景下的广泛评估表明,DORA在URLLC延迟降低、eMBB吞吐量提升(服务等级协议违约更少)以及mMTC覆盖扩展(不影响高优先级切片资源)方面,均优于三种非学习基线方法和一种\texttt{DQN}智能体。据我们所知,这是首个在O-RAN中实现自适应切片感知PRB分配的完全在线DRL框架。