Lately, differential privacy (DP) has been introduced in cooperative multiagent reinforcement learning (CMARL) to safeguard the agents' privacy against adversarial inference during knowledge sharing. Nevertheless, we argue that the noise introduced by DP mechanisms may inadvertently give rise to a novel poisoning threat, specifically in the context of private knowledge sharing during CMARL, which remains unexplored in the literature. To address this shortcoming, we present an adaptive, privacy-exploiting, and evasion-resilient localized poisoning attack (PeLPA) that capitalizes on the inherent DP-noise to circumvent anomaly detection systems and hinder the optimal convergence of the CMARL model. We rigorously evaluate our proposed PeLPA attack in diverse environments, encompassing both non-adversarial and multiple-adversarial contexts. Our findings reveal that, in a medium-scale environment, the PeLPA attack with attacker ratios of 20% and 40% can lead to an increase in average steps to goal by 50.69% and 64.41%, respectively. Furthermore, under similar conditions, PeLPA can result in a 1.4x and 1.6x computational time increase in optimal reward attainment and a 1.18x and 1.38x slower convergence for attacker ratios of 20% and 40%, respectively.
翻译:近期,差分隐私(DP)技术被引入合作式多智能体强化学习(CMARL),以在知识共享过程中保护智能体免受对抗性推理的隐私泄露。然而,我们认为DP机制引入的噪声可能在CMARL的私有知识共享场景中意外催生一种新型投毒威胁,该威胁在现有文献中尚未得到探索。针对这一不足,我们提出一种自适应的、隐私利用的、且具有抗逃逸性的局部化投毒攻击(PeLPA),该攻击利用DP机制固有的噪声来规避异常检测系统并阻碍CMARL模型的最优收敛。我们在包含非对抗性和多对抗性场景的多样化环境中严格评估了所提PeLPA攻击的性能。研究发现,在中规模环境中,当攻击者比例为20%和40%时,PeLPA攻击可使到达目标的平均步数分别增加50.69%和64.41%。此外,在相似条件下,对于20%和40%的攻击者比例,PeLPA会导致获得最优奖励的计算时间分别增加1.4倍和1.6倍,收敛速度分别下降1.18倍和1.38倍。