Lately, differential privacy (DP) has been introduced in cooperative multiagent reinforcement learning (CMARL) to safeguard the agents' privacy against adversarial inference during knowledge sharing. Nevertheless, we argue that the noise introduced by DP mechanisms may inadvertently give rise to a novel poisoning threat, specifically in the context of private knowledge sharing during CMARL, which remains unexplored in the literature. To address this shortcoming, we present an adaptive, privacy-exploiting, and evasion-resilient localized poisoning attack (PeLPA) that capitalizes on the inherent DP-noise to circumvent anomaly detection systems and hinder the optimal convergence of the CMARL model. We rigorously evaluate our proposed PeLPA attack in diverse environments, encompassing both non-adversarial and multiple-adversarial contexts. Our findings reveal that, in a medium-scale environment, the PeLPA attack with attacker ratios of 20% and 40% can lead to an increase in average steps to goal by 50.69% and 64.41%, respectively. Furthermore, under similar conditions, PeLPA can result in a 1.4x and 1.6x computational time increase in optimal reward attainment and a 1.18x and 1.38x slower convergence for attacker ratios of 20% and 40%, respectively.
翻译:近期,差分隐私(DP)已被引入合作性多智能体强化学习(CMARL)中,以保护智能体在知识共享过程中免受对抗性推理的隐私泄露。然而,我们认为,DP机制引入的噪声可能无意中催生一种新型投毒威胁——具体而言,即在CMARL私有知识共享场景下尚未被文献探讨的攻击方式。为弥补这一不足,我们提出一种自适应、利用隐私且具有逃逸鲁棒性的局部化投毒攻击(PeLPA),该攻击利用DP固有噪声规避异常检测系统,并阻碍CMARL模型的最优收敛。我们在非对抗与多对抗等多种环境中严格评估了所提PeLPA攻击。结果表明,在中规模环境下,当攻击者比例分别为20%和40%时,PeLPA攻击可使平均到达目标步数增加50.69%和64.41%。此外,在类似条件下,PeLPA可使实现最优奖励的计算时间分别增加1.4倍和1.6倍,并使攻击者比例为20%和40%时的收敛速度分别降低1.18倍和1.38倍。