ADVSCORE: A Metric for the Evaluation and Creation of Adversarial Benchmarks

Adversarial benchmarks validate model abilities by providing samples that fool models but not humans. However, despite the proliferation of datasets that claim to be adversarial, there does not exist an established metric to evaluate how adversarial these datasets are. To address this lacuna, we introduce ADVSCORE, a metric which quantifies how adversarial and discriminative an adversarial dataset is and exposes the features that make data adversarial. We then use ADVSCORE to underpin a dataset creation pipeline that incentivizes writing a high-quality adversarial dataset. As a proof of concept, we use ADVSCORE to collect an adversarial question answering (QA) dataset, ADVQA, from our pipeline. The high-quality questions in ADVQA surpasses three adversarial benchmarks across domains at fooling several models but not humans. We validate our result based on difficulty estimates from 9,347 human responses on four datasets and predictions from three models. Moreover, ADVSCORE uncovers which adversarial tactics used by human writers fool models (e.g., GPT-4) but not humans. Through ADVSCORE and its analyses, we offer guidance on revealing language model vulnerabilities and producing reliable adversarial examples.

翻译：对抗性基准通过提供能欺骗模型但无法欺骗人类的样本来验证模型能力。然而，尽管声称具有对抗性的数据集不断涌现，目前尚缺乏一个公认的指标来评估这些数据集的对抗性程度。为填补这一空白，我们提出了ADVSCORE指标，该指标可量化对抗性数据集的对抗性和判别性，并揭示使数据具有对抗性的特征。随后，我们利用ADVSCORE构建了一个数据集创建流程，该流程能激励编写高质量的对抗性数据集。作为概念验证，我们使用ADVSCORE从该流程中收集了一个对抗性问答（QA）数据集ADVQA。ADVQA中的高质量问题在欺骗多个模型（而非人类）方面，超越了三个跨领域的对抗性基准。我们基于9,347条人类对四个数据集的难度评估以及三个模型的预测结果验证了该结论。此外，ADVSCORE揭示了人类编写者使用的哪些对抗性策略能欺骗模型（如GPT-4）却无法欺骗人类。通过ADVSCORE及其分析，我们为揭示语言模型脆弱性和生成可靠的对抗性示例提供了指导。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日