Open Challenges in Multi-Agent Security: Towards Secure Systems of Interacting AI Agents

Christian Schroeder de Witt,Klaudia Krawiecka,Igor Krawczuk,Ben Hagag,William L. Anderson,Peter Belcak,Ben Bucknall,Xiaohong Cai,Ayush Chopra,Doron Cohen,Ron F. Del Rosario,Andis Draguns,Annie Gray,Keren Katz,Vasilios Mavroudis,Jaron Mink,Sumeet Ramesh Motwani,Jonathan Petit,Leif-Sebastian Rembeck,Chandler Smith,John Sotiropoulos,Steven Young,Sarah Scheffler,Mary Llewellyn

AI agents are beginning to interact with each other directly and across internet platforms and physical environments, creating security challenges beyond traditional cybersecurity and AI safety frameworks. Free-form protocols are essential for AI's task generalization but enable new threats like secret collusion and coordinated swarm attacks. Network effects can rapidly spread privacy breaches, disinformation, jailbreaks, and data poisoning, while multi-agent dispersion and stealth optimization help adversaries evade oversight - creating novel persistent threats at a systemic level. Despite their critical importance, these security challenges remain understudied, with research fragmented across disparate fields including AI security, multi-agent learning, complex systems, cybersecurity, game theory, distributed systems, and technical AI governance. We introduce multi-agent security, a new field dedicated to securing networks of AI agents against threats that emerge or amplify through their interactions - whether direct or indirect via shared environments - with each other, humans, and institutions, and characterise fundamental security-utility and security-security trade-offs across both distributed and decentralised settings. Our preliminary work (1) taxonomizes the threat landscape arising from interacting AI agents, (2) offers applications to multi-agent security for work across diffuse subfields, and (3) proposes a unified research agenda addressing open challenges in designing secure agent systems and interaction environments. By identifying these gaps, we aim to guide research in this critical area to unlock the socioeconomic potential of large-scale agent deployment, foster public trust, and mitigate national security risks in critical infrastructure and defense contexts.

翻译：人工智能智能体正开始通过互联网平台和物理环境直接相互交互，由此产生了超越传统网络安全和人工智能安全框架的安全挑战。自由格式协议对于AI的任务泛化至关重要，但同时也催生了诸如秘密串通和协调式群体攻击等新型威胁。网络效应可能迅速传播隐私泄露、虚假信息、越狱攻击和数据投毒，而多智能体分散部署与隐蔽性优化则帮助对手规避监管——从而在系统层面形成新型持续性威胁。尽管这些安全挑战至关重要，但相关研究仍显不足，且碎片化地分布在包括AI安全、多智能体学习、复杂系统、网络安全、博弈论、分布式系统和技术性AI治理等不同领域。我们提出了"多智能体安全"这一新领域，致力于保护AI智能体网络免受通过相互交互（无论是直接交互还是通过共享环境的间接交互）、与人类及机构交互而产生或放大的威胁，并刻画了分布式和去中心化场景下的基本安全-效用权衡及安全-安全权衡。我们的初步工作：（1）对交互式AI智能体引发的威胁格局进行了分类；（2）为跨分散子领域的多智能体安全研究提供了应用框架；（3）提出了统一的研究议程，以应对设计安全智能体系统和交互环境中的开放挑战。通过识别这些研究空白，我们旨在引导这一关键领域的研究方向，从而释放大规模智能体部署的社会经济潜力、增强公众信任，并降低关键基础设施和国防领域的国家安全风险。