AART: AI-Assisted Red-Teaming with Diverse Data Generation for New LLM-powered Applications

Adversarial testing of large language models (LLMs) is crucial for their safe and responsible deployment. We introduce a novel approach for automated generation of adversarial evaluation datasets to test the safety of LLM generations on new downstream applications. We call it AI-assisted Red-Teaming (AART) - an automated alternative to current manual red-teaming efforts. AART offers a data generation and augmentation pipeline of reusable and customizable recipes that reduce human effort significantly and enable integration of adversarial testing earlier in new product development. AART generates evaluation datasets with high diversity of content characteristics critical for effective adversarial testing (e.g. sensitive and harmful concepts, specific to a wide range of cultural and geographic regions and application scenarios). The data generation is steered by AI-assisted recipes to define, scope and prioritize diversity within the application context. This feeds into a structured LLM-generation process that scales up evaluation priorities. Compared to some state-of-the-art tools, AART shows promising results in terms of concept coverage and data quality.

翻译：针对大语言模型（LLM）的对抗性测试对于其安全可靠的部署至关重要。我们提出了一种新方法，用于自动生成对抗性评估数据集，以测试LLM在新下游应用中的生成内容安全性。我们将其称为AI辅助红队测试（AART）——一种替代当前手动红队测试的自动化方案。AART提供了一套可重用与可定制的数据生成和增强流水线，显著减少人工投入，并支持在新产品开发的早期阶段集成对抗性测试。AART生成的评估数据集具有高度多样化的内容特征，这些特征对于有效的对抗性测试至关重要（例如涵盖不同文化和地理区域及应用场景下的敏感与有害概念）。数据生成过程由AI辅助的配方驱动，以定义、界定并优先考虑应用上下文中的多样性，进而引导结构化的LLM生成流程，以规模化扩展评估优先级。与某些最先进工具相比，AART在概念覆盖率和数据质量方面展现出有前景的结果。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日