Red-Teaming for Generative AI: Silver Bullet or Security Theater?

In response to rising concerns surrounding the safety, security, and trustworthiness of Generative AI (GenAI) models, practitioners and regulators alike have pointed to AI red-teaming as a key component of their strategies for identifying and mitigating these risks. However, despite AI red-teaming's central role in policy discussions and corporate messaging, significant questions remain about what precisely it means, what role it can play in regulation, and how it relates to conventional red-teaming practices as originally conceived in the field of cybersecurity. In this work, we identify recent cases of red-teaming activities in the AI industry and conduct an extensive survey of relevant research literature to characterize the scope, structure, and criteria for AI red-teaming practices. Our analysis reveals that prior methods and practices of AI red-teaming diverge along several axes, including the purpose of the activity (which is often vague), the artifact under evaluation, the setting in which the activity is conducted (e.g., actors, resources, and methods), and the resulting decisions it informs (e.g., reporting, disclosure, and mitigation). In light of our findings, we argue that while red-teaming may be a valuable big-tent idea for characterizing GenAI harm mitigations, and that industry may effectively apply red-teaming and other strategies behind closed doors to safeguard AI, gestures towards red-teaming (based on public definitions) as a panacea for every possible risk verge on security theater. To move toward a more robust toolbox of evaluations for generative AI, we synthesize our recommendations into a question bank meant to guide and scaffold future AI red-teaming practices.

翻译：针对生成式AI模型安全性、可靠性和可信度日益增长的担忧，从业者和监管机构均将AI红队测试视为识别与缓解这些风险的关键策略组成部分。然而，尽管AI红队测试在政策讨论和企业宣传中占据核心地位，但其确切含义、在监管中的作用，以及与传统网络安全领域最初构想的红队测试实践之间的关系仍存在重大疑问。本研究通过识别近期AI行业的红队测试案例，并对相关研究文献进行广泛调查，以界定AI红队测试实践的范畴、结构与准则。分析表明，现有AI红队测试方法与实践在多个维度存在差异，包括活动目的（通常模糊不清）、评估对象、实施环境（如参与者、资源与方法）以及后续决策依据（如报告、披露与缓解措施）。基于研究发现，我们主张：虽然红队测试作为描述生成式AI危害缓解的宏观概念具有价值，且行业可在非公开场景有效运用红队测试及其他策略保障AI安全，但将基于公开定义的红队测试视为应对所有潜在风险的万能方案，已近乎安全剧场。为构建更完善的生成式AI评估工具箱，我们将建议整合为问题库，旨在引导并支撑未来AI红队测试实践。

相关内容

关注 7110

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日