This paper presents the Multi-Language Audio Anti-Spoofing Dataset (MLAAD), version 10: a dataset of synthetic audio to train and evaluate audio deepfake detection models. It features 175 Text-to-Speech (TTS) models, comprising a total of 1002.9 hours of synthetic voice in 54 different languages. To evaluate this dataset, we train three state-of-the-art deepfake detection models with MLAAD and observe that it demonstrates superior performance to comparable datasets like InTheWild and FakeOrReal when used as a training resource. Moreover, compared to the renowned ASVspoof 2019 dataset, MLAAD proves to be a complementary resource. In tests across eight datasets, MLAAD and ASVspoof 2019 alternately outperformed each other, each excelling on four datasets. By publishing the dataset and making a trained model accessible via an interactive webserver, we aim to democratize anti-spoofing technology, making it accessible beyond the realm of specialists, and contributing to global efforts against audio spoofing and deepfakes.
翻译:本文介绍了多语言音频反欺骗数据集(MLAAD)10.0版本:一个用于训练和评估音频深度伪造检测模型的合成音频数据集。该数据集包含175个文本转语音(TTS)模型,共涵盖54种语言的1002.9小时合成语音。为评估该数据集,我们使用MLAAD训练了三个最先进的深度伪造检测模型,并观察到作为训练资源时,其性能优于InTheWild和FakeOrReal等同类数据集。此外,与著名的ASVspoof 2019数据集相比,MLAAD证明了其互补性。在八个数据集的测试中,MLAAD与ASVspoof 2019交替领先,各自在四个数据集上表现优异。通过发布数据集并通过交互式网络服务器提供训练好的模型,我们旨在推动反欺骗技术的民主化,使其超越专业领域范畴,并为全球应对音频欺骗与深度伪造的努力做出贡献。