AI红队测试的红队测试 (Red Teaming AI Red Teaming)

Red teaming has evolved from its origins in military applications to become a widely adopted methodology in cybersecurity and AI. In this paper, we take a critical look at the practice of AI red teaming. We argue that despite its current popularity in AI governance, there exists a significant gap between red teaming's original intent as a critical thinking exercise and its narrow focus on discovering model-level flaws in the context of generative AI. Current AI red teaming efforts focus predominantly on individual model vulnerabilities while overlooking the broader sociotechnical systems and emergent behaviors that arise from complex interactions between models, users, and environments. To address this deficiency, we propose a comprehensive framework operationalizing red teaming in AI systems at two levels: macro-level system red teaming spanning the entire AI development lifecycle, and micro-level model red teaming. Drawing on cybersecurity experience and systems theory, we further propose a set of six recommendations. In these, we emphasize that effective AI red teaming requires multifunctional teams that examine emergent risks, systemic vulnerabilities, and the interplay between technical and social factors.

翻译：红队测试已从军事应用起源演变为网络安全和人工智能领域广泛采用的方法论。本文对AI红队测试实践进行了批判性审视。我们认为，尽管当前AI治理中红队测试备受推崇，但其作为批判性思维训练的最初意图与生成式AI背景下聚焦模型层面缺陷的狭隘定位存在显著差距。当前AI红队测试主要关注个体模型漏洞，却忽视了模型、用户与环境复杂交互所产生的更广泛社会技术系统及涌现行为。为弥补这一缺陷，我们提出了在AI系统中实施红队测试的综合框架，涵盖两个层面：贯穿AI全生命周期的宏观系统级红队测试，以及微观模型级红队测试。借鉴网络安全经验与系统理论，我们进一步提出六项建议，强调有效的AI红队测试需要跨职能团队共同审视涌现风险、系统性漏洞以及技术与社会因素的相互作用。

相关内容

关注 7093

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日