Collaborative multi-agent reinforcement learning has rapidly evolved, offering state-of-the-art algorithms for real-world applications, including sensitive domains. However, a key challenge to its widespread adoption is the lack of a thorough investigation into its vulnerabilities to adversarial attacks. Existing work predominantly focuses on training-time attacks or unrealistic scenarios, such as access to policy weights or the ability to train surrogate policies. In this paper, we investigate new vulnerabilities under more challenging and constrained conditions, assuming an adversary can only collect and perturb the observations of deployed agents. We also consider scenarios where the adversary has no access at all (no observations, actions, or weights). Our main approach is to generate perturbations that intentionally misalign how victim agents see their environment. Our approach is empirically validated on three benchmarks and 22 environments, demonstrating its effectiveness across diverse algorithms and environments. Furthermore, we show that our algorithm is sample-efficient, requiring only 1,000 samples compared to the millions needed by previous methods.
翻译:协作多智能体强化学习已迅速发展,为包括敏感领域在内的现实世界应用提供了最先进的算法。然而,其广泛采用的一个关键挑战在于缺乏对其对抗攻击脆弱性的深入研究。现有工作主要集中于训练时攻击或不切实际的场景,例如获取策略权重或训练替代策略的能力。本文在更具挑战性和约束性的条件下研究新的脆弱性,假设攻击者仅能收集并扰动已部署智能体的观测值。我们还考虑了攻击者完全无法访问任何信息(无观测值、动作或权重)的场景。我们的主要方法是生成有意使受害智能体对环境感知失准的扰动。该方法在三个基准测试和22个环境中进行了实证验证,证明了其在多种算法与环境中的有效性。此外,我们表明所提算法具有样本高效性,仅需1,000个样本,而先前方法需要数百万样本。