The success of immersive applications such as virtual reality (VR) gaming and metaverse services depends on low latency and reliable connectivity. To provide seamless user experiences, the open radio access network (O-RAN) architecture and 6G networks are expected to play a crucial role. RAN slicing, a critical component of the O-RAN paradigm, enables network resources to be allocated based on the needs of immersive services, creating multiple virtual networks on a single physical infrastructure. In the O-RAN literature, deep reinforcement learning (DRL) algorithms are commonly used to optimize resource allocation. However, the practical adoption of DRL in live deployments has been sluggish. This is primarily due to the slow convergence and performance instabilities suffered by the DRL agents both upon initial deployment and when there are significant changes in network conditions. In this paper, we investigate the impact of time series forecasting of traffic demands on the convergence of the DRL-based slicing agents. For that, we conduct an exhaustive experiment that supports multiple services including real VR gaming traffic. We then propose a novel forecasting-aided DRL approach and its respective O-RAN practical deployment workflow to enhance DRL convergence. Our approach shows up to 22.8%, 86.3%, and 300% improvements in the average initial reward value, convergence rate, and number of converged scenarios respectively, enhancing the generalizability of the DRL agents compared with the implemented baselines. The results also indicate that our approach is robust against forecasting errors and that forecasting models do not have to be ideal.
翻译:沉浸式应用(如虚拟现实游戏和元宇宙服务)的成功依赖于低延迟和可靠连接。为提供无缝用户体验,开放无线接入网(O-RAN)架构和6G网络预计将发挥关键作用。作为O-RAN范式的关键组成部分,RAN切片能够根据沉浸式服务的需求分配网络资源,在单一物理基础设施上创建多个虚拟网络。在O-RAN文献中,深度强化学习(DRL)算法通常用于优化资源分配。然而,DRL在实际部署中的采纳进展缓慢。这主要是由于DRL智能体在初始部署及网络条件发生显著变化时,存在收敛缓慢和性能不稳定的问题。本文研究了流量需求时间序列预测对基于DRL的切片智能体收敛性的影响。为此,我们进行了包含真实VR游戏流量等多种服务的详尽实验,并提出了一种新型预测辅助DRL方法及其对应的O-RAN实际部署工作流,以增强DRL的收敛性。与实现的基线相比,我们的方法在初始奖励均值、收敛速度和收敛场景数量上分别实现了高达22.8%、86.3%和300%的提升,增强了DRL智能体的泛化能力。结果还表明,我们的方法对预测误差具有鲁棒性,且预测模型无需达到理想状态。