MLAAD: The Multi-Language Audio Anti-Spoofing Dataset

Text-to-Speech (TTS) technology offers notable benefits, such as providing a voice for individuals with speech impairments, but it also facilitates the creation of audio deepfakes and spoofing attacks. AI-based detection methods can help mitigate these risks; however, the performance of such models is inherently dependent on the quality and diversity of their training data. Presently, the available datasets are heavily skewed towards English and Chinese audio, which limits the global applicability of these anti-spoofing systems. To address this limitation, this paper presents the Multi-Language Audio Anti-Spoofing Dataset (MLAAD), version 9, created using 140 TTS models, comprising 78 different architectures, to generate 678,3 hours of synthetic voice in 51 different languages. We train and evaluate three state-of-the-art deepfake detection models with MLAAD and observe that it demonstrates superior performance over comparable datasets like InTheWild and Fake-Or-Real when used as a training resource. Moreover, compared to the renowned ASVspoof 2019 dataset, MLAAD proves to be a complementary resource. In tests across eight datasets, MLAAD and ASVspoof 2019 alternately outperformed each other, each excelling on four datasets. By publishing MLAAD and making a trained model accessible via an interactive webserver, we aim to democratize anti-spoofing technology, making it accessible beyond the realm of specialists, and contributing to global efforts against audio spoofing and deepfakes.

翻译：文本转语音（TTS）技术带来了显著益处，例如为有言语障碍的人士提供语音支持，但它也助长了音频深度伪造和欺骗攻击的产生。基于人工智能的检测方法有助于降低这些风险；然而，此类模型的性能本质上取决于其训练数据的质量与多样性。目前，可用的数据集严重偏向英语和中文音频，这限制了这些反欺骗系统的全球适用性。为解决这一局限，本文提出了多语言音频反欺骗数据集（MLAAD）第9版，该数据集使用140个TTS模型（涵盖78种不同架构）生成，包含51种不同语言共计678.3小时的合成语音。我们使用MLAAD训练并评估了三种最先进的深度伪造检测模型，发现其作为训练资源时，在性能上优于InTheWild和Fake-Or-Real等同类数据集。此外，与著名的ASVspoof 2019数据集相比，MLAAD被证明是一种互补性资源。在八个数据集上的测试中，MLAAD与ASVspoof 2019交替表现更优，各自在四个数据集上领先。通过发布MLAAD并通过交互式网络服务器提供训练好的模型，我们旨在普及反欺骗技术，使其超越专家领域，为全球范围内对抗音频欺骗和深度伪造的努力做出贡献。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

迈向可控语音合成：大语言模型时代的综述

专知会员服务

24+阅读 · 2024年12月13日

《语音大语言模型》最新进展综述

专知会员服务

58+阅读 · 2024年10月8日

【剑桥大学博士论文】主动学习和半监督学习在语音识别中的应用，238页pdf

专知会员服务

31+阅读 · 2024年4月13日

【2023新书】神经文本到语音合成，214页pdf

专知会员服务

39+阅读 · 2023年6月9日