Trojans in Artificial Intelligence (TrojAI) Final Report

Kristopher W. Reese,Taylor Kulp-McDowall,Michael Majurski,Tim Blattner,Derek Juba,Peter Bajcsy,Antonio Cardone,Philippe Dessauw,Alden Dima,Anthony J. Kearsley,Melinda Kleczynski,Joel Vasanth,Walid Keyrouz,Chace Ashcraft,Neil Fendley,Ted Staley,Trevor Stout,Josh Carney,Greg Canal,Will Redman,Aurora Schmidt,Cameron Hickert,William Paul,Jared Markowitz,Nathan Drenkow,David Shriver,Marissa Connor,Keltin Grimes,Marco Christiani,Hayden Moore,Jordan Widjaja,Kasimir Gabert,Uma Balakrishnan,Satyanadh Gundimada,John Jacobellis,Sandya Lakkur,Vitus Leung,Jon Roose,Casey Battaglino,Farinaz Koushanfar,Greg Fields,Xihe Gu,Yaman Jandali,Xinqiao Zhang,Akash Vartak,Tim Oates,Ben Erichson,Michael Mahoney,Rauf Izmailov,Xiangyu Zhang,Guangyu Shen,Siyuan Cheng,Shiqing Ma,XiaoFeng Wang,Haixu Tang,Di Tang,Xiaoyi Chen,Zihao Wang,Rui Zhu,Susmit Jha,Xiao Lin,Manoj Acharya,Wenchao Li,Chao Chen

The Intelligence Advanced Research Projects Activity (IARPA) launched the TrojAI program to confront an emerging vulnerability in modern artificial intelligence: the threat of AI Trojans. These AI trojans are malicious, hidden backdoors intentionally embedded within an AI model that can cause a system to fail in unexpected ways, or allow a malicious actor to hijack the AI model at will. This multi-year initiative helped to map out the complex nature of the threat, pioneered foundational detection methods, and identified unsolved challenges that require ongoing attention by the burgeoning AI security field. This report synthesizes the program's key findings, including methodologies for detection through weight analysis and trigger inversion, as well as approaches for mitigating Trojan risks in deployed models. Comprehensive test and evaluation results highlight detector performance, sensitivity, and the prevalence of "natural" Trojans. The report concludes with lessons learned and recommendations for advancing AI security research.

翻译：美国情报高级研究计划局（IARPA）启动了TrojAI计划，以应对现代人工智能中一个新兴的脆弱性：AI木马的威胁。这些AI木马是恶意、隐蔽的后门，被有意嵌入AI模型中，可导致系统以意外方式失效，或允许恶意行为者随意劫持AI模型。这项为期多年的计划帮助厘清了该威胁的复杂本质，开创了基础性的检测方法，并指出了蓬勃发展的AI安全领域仍需持续关注的未解难题。本报告综合了该计划的关键发现，包括通过权重分析和触发器逆向进行检测的方法论，以及缓解已部署模型中木马风险的对策。全面的测试与评估结果凸显了检测器的性能、灵敏度以及“自然”木马的普遍性。报告最后总结了经验教训，并为推进AI安全研究提出了建议。

相关内容

关注 7103

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

《"半人马"训练计划：美国陆军北方司令部兵棋推演与Scale AI系统集成》最新报告

专知会员服务

34+阅读 · 2025年8月12日

《人工智能对抗认知战的基本风险》最新报告30页

专知会员服务

25+阅读 · 2025年7月8日

《人工智能对网络安全的影响：攻防平衡的转变》46页报告

专知会员服务

24+阅读 · 2025年4月13日

机密计算保障人工智能系统安全研究报告

专知会员服务

19+阅读 · 2025年1月20日