面向黑盒多智能体的自适应扰动对抗攻击 (Adversarial Attack on Black-Box Multi-Agent by Adaptive Perturbation)

Evaluating security and reliability for multi-agent systems (MAS) is urgent as they become increasingly prevalent in various applications. As an evaluation technique, existing adversarial attack frameworks face certain limitations, e.g., impracticality due to the requirement of white-box information or high control authority, and a lack of stealthiness or effectiveness as they often target all agents or specific fixed agents. To address these issues, we propose AdapAM, a novel framework for adversarial attacks on black-box MAS. AdapAM incorporates two key components: (1) Adaptive Selection Policy simultaneously selects the victim and determines the anticipated malicious action (the action would lead to the worst impact on MAS), balancing effectiveness and stealthiness. (2) Proxy-based Perturbation to Induce Malicious Action utilizes generative adversarial imitation learning to approximate the target MAS, allowing AdapAM to generate perturbed observations using white-box information and thus induce victims to execute malicious action in black-box settings. We evaluate AdapAM across eight multi-agent environments and compare it with four state-of-the-art and commonly-used baselines. Results demonstrate that AdapAM achieves the best attack performance in different perturbation rates. Besides, AdapAM-generated perturbations are the least noisy and hardest to detect, emphasizing the stealthiness.

翻译：随着多智能体系统（MAS）在各种应用中的日益普及，评估其安全性与可靠性变得尤为迫切。作为评估技术，现有对抗攻击框架面临若干局限性：例如，因需要白盒信息或高控制权限而缺乏实用性，以及因通常针对所有智能体或特定固定智能体而缺乏隐蔽性或有效性。为解决这些问题，我们提出AdapAM——一种面向黑盒MAS的新型对抗攻击框架。AdapAM包含两个核心组件：（1）自适应选择策略：同步选择受害智能体并确定预期恶意行为（该行为将对MAS产生最严重影响），在有效性与隐蔽性之间取得平衡。（2）基于代理的扰动诱导恶意行为：利用生成对抗模仿学习逼近目标MAS，使AdapAM能够利用白盒信息生成扰动观测，从而在黑盒环境下诱导受害智能体执行恶意行为。我们在八个多智能体环境中评估AdapAM，并与四种先进常用基线方法进行比较。结果表明，AdapAM在不同扰动率下均取得最佳攻击性能。此外，AdapAM生成的扰动噪声最小且最难被检测，凸显了其隐蔽性优势。