Malice in Agentland: Down the Rabbit Hole of Backdoors in the AI Supply Chain

Léo Boisvert,Abhay Puri,Chandra Kiran Reddy Evuru,Nazanin Sepahvand,Nicolas Chapados,Quentin Cappart,Alexandre Lacoste,Krishnamurthy Dj Dvijotham,Alexandre Drouin

from arxiv, 27 pages

While finetuning AI agents on interaction data -- such as web browsing or tool use -- improves their capabilities, it also introduces critical security vulnerabilities within the agentic AI supply chain. We show that adversaries can effectively poison the data collection pipeline at multiple stages to embed hard-to-detect backdoors that, when triggered, cause unsafe or malicious behavior. We formalize three realistic threat models across distinct layers of the supply chain: direct poisoning of finetuning data, pre-backdoored base models, and environment poisoning, a novel attack vector that exploits vulnerabilities specific to agentic training pipelines. Evaluated on two widely adopted agentic benchmarks, all three threat models prove effective: poisoning only a small number of demonstrations is sufficient to embed a backdoor that causes an agent to leak confidential user information with over 80\% success.

翻译：尽管在交互数据（如网络浏览或工具使用）上微调AI智能体可提升其能力，但这也为智能体AI供应链引入了关键安全漏洞。我们证明，攻击者可在多个阶段有效污染数据收集流水线，嵌入难以检测的后门——当这些后门被触发时，将导致不安全或恶意行为。我们形式化定义了供应链不同层级的三种现实威胁模型：对微调数据的直接投毒、预植入后门的基础模型，以及环境投毒——一种利用智能体训练流水线特有漏洞的新型攻击向量。在两种广泛采用的智能体基准测试中评估表明，所有三种威胁模型均有效：仅需投毒少量示范数据即可嵌入后门，导致智能体以超过80%的成功率泄露用户机密信息。

相关内容

关注 7108

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

保护网络物理系统中的 AI 智能体：关于环境交互、深度伪造威胁及其防御技术的综述

专知会员服务

10+阅读 · 2月15日

计算机视觉领域的后门攻击与防御：综述

专知会员服务

19+阅读 · 2025年9月13日

深度学习中的架构后门：漏洞、检测与防御综述

专知会员服务

12+阅读 · 2025年7月19日

【新书】利用生成式人工智能进行网络防御策略

专知会员服务

31+阅读 · 2024年10月18日