5G networks enable diverse services such as eMBB, URLLC, and mMTC through network slicing, necessitating intelligent admission control and resource allocation to meet stringent QoS requirements while maximizing Network Service Provider (NSP) profits. However, existing Deep Reinforcement Learning (DRL) frameworks focus primarily on profit optimization without explicitly accounting for service delay, potentially leading to QoS violations for latency-sensitive slices. Moreover, commonly used epsilon-greedy exploration of DRL often results in unstable convergence and suboptimal policy learning. To address these gaps, we propose DePSAC -- a Delay and Profit-aware Slice Admission Control scheme. Our DRL-based approach incorporates a delay-aware reward function, where penalties due to service delay incentivize the prioritization of latency-critical slices such as URLLC. Additionally, we employ Boltzmann exploration to achieve smoother and faster convergence. We implement and evaluate DePSAC on a simulated 5G core network substrate with realistic Network Slice Request (NSLR) arrival patterns. Experimental results demonstrate that our method outperforms the DSARA baseline in terms of overall profit, reduced URLLC slice delays, improved acceptance rates, and improved resource consumption. These findings validate the effectiveness of the proposed DePSAC in achieving better QoS-profit trade-offs for practical 5G network slicing scenarios.
翻译:5G网络通过网络切片技术支持增强移动宽带(eMBB)、超可靠低时延通信(URLLC)和海量机器类通信(mMTC)等多样化服务,这需要智能化的准入控制与资源分配机制,在满足严格服务质量(QoS)要求的同时最大化网络服务提供商(NSP)的收益。然而,现有的深度强化学习(DRL)框架主要聚焦于收益优化,未能显式考虑服务时延,可能导致对时延敏感切片的QoS违规。此外,DRL常用的ε-贪婪探索策略常导致收敛不稳定和策略学习次优。为弥补这些不足,本文提出DePSAC——一种时延与收益感知的切片准入控制方案。我们基于DRL的方法设计了时延感知奖励函数,通过服务时延引入的惩罚机制激励系统优先处理URLLC等关键时延切片。同时,采用玻尔兹曼探索策略以实现更平滑、更快速的收敛。我们在模拟的5G核心网基础设施上实现了DePSAC,并采用真实的网络切片请求(NSLR)到达模式进行评估。实验结果表明,相较于DSARA基线方法,我们的方案在总体收益、降低URLLC切片时延、提升接纳率以及优化资源消耗方面均表现更优。这些发现验证了所提DePSAC方案在实际5G网络切片场景中实现更优QoS-收益权衡的有效性。