FGGM: Formal Grey-box Gradient Method for Attacking DRL-based MU-MIMO Scheduler

In 5G mobile communication systems, MU-MIMO has been applied to enhance spectral efficiency and support high data rates. To maximize spectral efficiency while providing fairness among users, the base station (BS) needs to selects a subset of users for data transmission. Given that this problem is NP-hard, DRL-based methods have been proposed to infer the near-optimal solutions in real-time, yet this approach has an intrinsic security problem. This paper investigates how a group of adversarial users can exploit unsanitized raw CSIs to launch a throughput degradation attack. Most existing studies only focused on systems in which adversarial users can obtain the exact values of victims' CSIs, but this is impractical in the case of uplink transmission in LTE/5G mobile systems. We note that the DRL policy contains an observation normalizer which has the mean and variance of the observation to improve training convergence. Adversarial users can then estimate the upper and lower bounds of the local observations including the CSIs of victims based solely on that observation normalizer. We develop an attacking scheme FGGM by leveraging polytope abstract domains, a technique used to bound the outputs of a neural network given the input ranges. Our goal is to find one set of intentionally manipulated CSIs which can achieve the attacking goals for the whole range of local observations of victims. Experimental results demonstrate that FGGM can determine a set of adversarial CSI vector controlled by adversarial users, then reuse those CSIs throughout the simulation to reduce the network throughput of a victim up to 70\% without knowing the exact value of victims' local observations. This study serves as a case study and can be applied to many other DRL-based problems, such as a knapsack-oriented resource allocation problems.

翻译：在5G移动通信系统中，MU-MIMO已被应用于提升频谱效率并支持高数据速率。为了在保证用户间公平性的同时最大化频谱效率，基站需要选择一个用户子集进行数据传输。鉴于该问题属于NP难问题，已有研究提出基于深度强化学习的方法来实时推断近似最优解，但这种方法存在固有的安全问题。本文研究了一组对抗性用户如何利用未经处理的原始信道状态信息发起吞吐量降低攻击。现有研究大多仅关注对抗性用户能够获取受害者CSI精确值的系统，但这在LTE/5G移动系统的上行传输场景中并不现实。我们注意到DRL策略包含一个观测归一化器，该归一化器保存了观测值的均值与方差以提升训练收敛性。对抗性用户可以仅基于该观测归一化器，估计出包含受害者CSI在内的局部观测值的上下界。通过利用多面体抽象域这一用于在给定输入范围时界定神经网络输出范围的技术，我们开发了一种名为FGGM的攻击方案。我们的目标是找到一组经过故意篡改的CSI，使其能够在受害者局部观测值的整个取值范围内实现攻击目标。实验结果表明，FGGM能够确定一组由对抗性用户控制的对抗性CSI向量，随后在仿真中重复使用这些CSI，在无需知晓受害者局部观测值精确信息的情况下，将受害者的网络吞吐量降低高达70%。本研究可作为典型案例，并适用于许多其他基于DRL的问题，例如面向背包问题的资源分配问题。