AdvSV: An Over-the-Air Adversarial Attack Dataset for Speaker Verification

It is known that deep neural networks are vulnerable to adversarial attacks. Although Automatic Speaker Verification (ASV) built on top of deep neural networks exhibits robust performance in controlled scenarios, many studies confirm that ASV is vulnerable to adversarial attacks. The lack of a standard dataset is a bottleneck for further research, especially reproducible research. In this study, we developed an open-source adversarial attack dataset for speaker verification research. As an initial step, we focused on the over-the-air attack. An over-the-air adversarial attack involves a perturbation generation algorithm, a loudspeaker, a microphone, and an acoustic environment. The variations in the recording configurations make it very challenging to reproduce previous research. The AdvSV dataset is constructed using the Voxceleb1 Verification test set as its foundation. This dataset employs representative ASV models subjected to adversarial attacks and records adversarial samples to simulate over-the-air attack settings. The scope of the dataset can be easily extended to include more types of adversarial attacks. The dataset will be released to the public under the CC-BY license. In addition, we also provide a detection baseline for reproducible research.

翻译：摘要：众所周知，深度神经网络易受对抗攻击的影响。尽管基于深度神经网络的自动说话人验证（ASV）在受控场景中展现出鲁棒性能，但多项研究证实ASV对对抗攻击非常脆弱。标准数据集的缺乏是进一步研究（尤其是可重复性研究）的主要瓶颈。本研究针对说话人验证任务开发了一个开源对抗攻击数据集。作为初始阶段，我们聚焦于空中攻击场景。空中对抗攻击涉及扰动生成算法、扬声器、麦克风及声学环境。录音配置的差异使得复现先前研究极具挑战性。AdvSV数据集以Voxceleb1验证测试集为基础构建，采用代表性ASV模型遭受对抗攻击并记录对抗样本，以模拟空中攻击设置。该数据集可便捷扩展以涵盖更多类型的对抗攻击。数据集将以CC-BY许可证公开发布。此外，我们还为可重复性研究提供了检测基准。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日