Network slicing-based communication systems can dynamically and efficiently allocate resources for diversified services. However, due to the limitation of the network interface on channel access and the complexity of the resource allocation, it is challenging to achieve an acceptable solution in the practical system without precise prior knowledge of the dynamics probability model of the service requests. Existing work attempts to solve this problem using deep reinforcement learning (DRL), however, such methods usually require a lot of interaction with the real environment in order to achieve good results. In this paper, a framework consisting of a digital twin and reinforcement learning agents is present to handle the issue. Specifically, we propose to use the historical data and the neural networks to build a digital twin model to simulate the state variation law of the real environment. Then, we use the data generated by the network slicing environment to calibrate the digital twin so that it is in sync with the real environment. Finally, DRL for slice optimization optimizes its own performance in this virtual pre-verification environment. We conducted an exhaustive verification of the proposed digital twin framework to confirm its scalability. Specifically, we propose to use loss landscapes to visualize the generalization of DRL solutions. We explore a distillation-based optimization scheme for lightweight slicing strategies. In addition, we also extend the framework to offline reinforcement learning, where solutions can be used to obtain intelligent decisions based solely on historical data. Numerical simulation experiments show that the proposed digital twin can significantly improve the performance of the slice optimization strategy.
翻译:基于网络切片的通信系统能够动态高效地为多样化服务分配资源。然而,受限于网络接口的信道接入约束与资源分配的复杂性,在缺乏服务请求动态概率模型精确先验知识的实际系统中,难以获得可接受的解决方案。现有工作尝试利用深度强化学习(DRL)解决该问题,但此类方法通常需要与环境进行大量交互才能取得良好效果。本文提出一种融合数字孪生与强化学习智能体的框架以应对该挑战。具体而言,我们利用历史数据与神经网络构建数字孪生模型,模拟真实环境的状态演化规律;继而采用网络切片环境生成的数据校准数字孪生,使其与真实环境保持同步;最终在该虚拟预验证环境中优化面向切片优化的DRL算法性能。我们对所提出的数字孪生框架进行了全面验证以确认其可扩展性。具体包括:利用损失景观(loss landscapes)可视化DRL解决方案的泛化能力;探索基于蒸馏的轻量化切片策略优化方案;此外还将该框架扩展至离线强化学习场景,使解决方案仅基于历史数据即可获取智能决策。数值仿真实验表明,所提出的数字孪生技术能显著提升切片优化策略的性能。