The proliferation of diverse wireless services in 5G and beyond has led to the emergence of network slicing technologies. Among these, admission control plays a crucial role in achieving service-oriented optimization goals through the selective acceptance of service requests. Although deep reinforcement learning (DRL) forms the foundation in many admission control approaches thanks to its effectiveness and flexibility, initial instability with excessive convergence delay of DRL models hinders their deployment in real-world networks. We propose a digital twin (DT) accelerated DRL solution to address this issue. Specifically, we first formulate the admission decision-making process as a semi-Markov decision process, which is subsequently simplified into an equivalent discrete-time Markov decision process to facilitate the implementation of DRL methods. A neural network-based DT is established with a customized output layer for queuing systems, trained through supervised learning, and then employed to assist the training phase of the DRL model. Extensive simulations show that the DT-accelerated DRL improves resource utilization by over 40% compared to the directly trained state-of-the-art dueling deep Q-learning model. This improvement is achieved while preserving the model's capability to optimize the long-term rewards of the admission process.
翻译:第五代移动通信(5G)及未来通信系统中多样化无线业务的激增促使了网络切片技术的兴起。其中,准入控制通过选择性接受业务请求,在实现面向服务的优化目标中发挥关键作用。尽管深度强化学习凭借其有效性和灵活性成为许多准入控制方法的基础,但其初始不稳定性与过大的收敛延迟阻碍了深度强化学习模型在实际网络中的部署。我们提出一种数字孪生加速的深度强化学习解决方案来解决该问题。具体而言,我们首先将准入决策过程建模为半马尔可夫决策过程,随后将其简化为等效的离散时间马尔可夫决策过程以方便深度强化学习方法的实施。通过为排队系统定制输出层,我们建立了基于神经网络的数字孪生模型,该模型经监督学习训练后用于辅助深度强化学习模型的训练阶段。大量仿真实验表明:与直接训练的最先进决斗深度Q学习模型相比,数字孪生加速的深度强化学习将资源利用率提升超过40%,同时保持了模型优化准入过程长期收益的能力。