In this paper, we present a multi-agent deep reinforcement learning (deep RL) framework for network slicing in a dynamic environment with multiple base stations and multiple users. In particular, we propose a novel deep RL framework with multiple actors and centralized critic (MACC) in which actors are implemented as pointer networks to fit the varying dimension of input. We evaluate the performance of the proposed deep RL algorithm via simulations to demonstrate its effectiveness. Subsequently, we develop a deep RL based jammer with limited prior information and limited power budget. The goal of the jammer is to minimize the transmission rates achieved with network slicing and thus degrade the network slicing agents' performance. We design a jammer with both listening and jamming phases and address jamming location optimization as well as jamming channel optimization via deep RL. We evaluate the jammer at the optimized location, generating interference attacks in the optimized set of channels by switching between the jamming phase and listening phase. We show that the proposed jammer can significantly reduce the victims' performance without direct feedback or prior knowledge on the network slicing policies. Finally, we devise a Nash-equilibrium-supervised policy ensemble mixed strategy profile for network slicing (as a defensive measure) and jamming. We evaluate the performance of the proposed policy ensemble algorithm by applying on the network slicing agents and the jammer agent in simulations to show its effectiveness.
翻译:本文提出了一种基于多智能体深度强化学习(deep RL)的框架,用于在多基站和多用户构成的动态环境中实现网络切片。具体而言,我们设计了一种新颖的多演员-集中式评论家(MACC)深度强化学习框架,其中演员通过指针网络实现以适应输入维度的变化。通过仿真实验验证了所提深度强化学习算法的有效性。随后,我们开发了一种基于深度强化学习的干扰器,该干扰器仅具备有限的先验信息和功率预算,其目标是最小化网络切片所实现的传输速率,从而降低网络切片智能体的性能。我们设计的干扰器兼具监听阶段与干扰阶段,并通过深度强化学习解决干扰位置优化与干扰信道优化问题。在优化后的位置上,干扰器通过切换干扰阶段与监听阶段,在优化的信道集合中生成干扰攻击。实验表明,所提干扰器无需网络切片策略的直接反馈或先验知识即可显著降低受害方的性能。最后,我们设计了一种基于纳什均衡监督的策略集成混合策略配置方案,用于网络切片(作为防御措施)与干扰。通过将所提策略集成算法分别应用于网络切片智能体与干扰智能体进行仿真,验证了其有效性。