Network slicing enables operators to efficiently support diverse applications on a common physical infrastructure. The ever-increasing densification of network deployment leads to complex and non-trivial inter-cell interference, which requires more than inaccurate analytic models to dynamically optimize resource management for network slices. In this paper, we develop a DIRP algorithm with multiple deep reinforcement learning (DRL) agents to cooperatively optimize resource partition in individual cells to fulfill the requirements of each slice, based on two alternative reward functions. Nevertheless, existing DRL approaches usually tie the pretrained model parameters to specific network environments with poor transferability, which raises practical deployment concerns in large-scale mobile networks. Hence, we design a novel transfer learning-aided DIRP (TL-DIRP) algorithm to ease the transfer of DIRP agents across different network environments in terms of sample efficiency, model reproducibility, and algorithm scalability. The TL-DIRP algorithm first centrally trains a generalized model and then transfers the "generalist" to each local agent as "specialist" with distributed finetuning and execution. TL-DIRP consists of two steps: 1) centralized training of a generalized distributed model, 2) transferring the "generalist" to each "specialist" with distributed finetuning and execution. The numerical results show that not only DIRP outperforms existing baseline approaches in terms of faster convergence and higher reward, but more importantly, TL-DIRP significantly improves the service performance, with reduced exploration cost, accelerated convergence rate, and enhanced model reproducibility. As compared to a traffic-aware baseline, TL-DIRP provides about 15% less violation ratio of the quality of service (QoS) for the worst slice service and 8.8% less violation on the average service QoS.
翻译:网络切片使运营商能够在通用物理基础设施上高效支持多样化应用。网络部署的持续密集化导致复杂且非平凡的小区间干扰,这需要超越不精确分析模型的动态资源管理优化方法。本文提出一种DIRP算法,通过多个深度强化学习(DRL)智能体协同优化各个小区的资源划分,基于两种可选奖励函数满足每个切片的服务需求。然而,现有DRL方法通常将预训练模型参数与特定网络环境绑定,可迁移性较差,这使得在大规模移动网络中的实际部署面临挑战。为此,我们设计了一种新型迁移学习辅助的DIRP(TL-DIRP)算法,从样本效率、模型可复现性和算法可扩展性三方面促进DIRP智能体在不同网络环境间的迁移。TL-DIRP算法首先集中训练一个通用模型,然后通过分布式微调与执行将"通才"迁移至各本地智能体作为"专才"。TL-DIRP包含两阶段:1)集中训练通用分布式模型,2)通过分布式微调与执行将"通才"迁移至各"专才"。数值结果表明:不仅DIRP在收敛速度和奖励值上优于现有基线方法,更重要的是,TL-DIRP显著提升了服务性能,降低了探索成本,加速了收敛速率,并增强了模型可复现性。与流量感知基线相比,TL-DIRP在最差切片服务质量(QoS)违反率上降低约15%,在平均服务QoS违反率上降低8.8%。