Fake audio detection is a growing concern and some relevant datasets have been designed for research. But there is no standard public Chinese dataset under additive noise conditions. In this paper, we aim to fill in the gap and design a Chinese fake audio detection dataset (FAD) for studying more generalized detection methods. Twelve mainstream speech generation techniques are used to generate fake audios. To simulate the real-life scenarios, three noise datasets are selected for noisy adding at five different signal noise ratios. FAD dataset can be used not only for fake audio detection, but also for detecting the algorithms of fake utterances for audio forensics. Baseline results are presented with analysis. The results that show fake audio detection methods with generalization remain challenging. The FAD dataset is publicly available.
翻译:摘要:伪造音频检测日益受到关注,为此已设计了一些相关数据集用于研究,但目前尚无公开的标准中文数据集涵盖加性噪声条件。本文旨在填补这一空白,设计一个中文伪造音频检测数据集(FAD),以支持更通用的检测方法研究。我们采用十二种主流语音生成技术生成伪造音频,并选取三个噪声数据集在五种不同信噪比下添加噪声,以模拟真实场景。FAD数据集不仅可用于伪造音频检测,还可用于音频取证中检测伪造语音算法。本文给出了基线结果并进行了分析。结果表明,具有泛化能力的伪造音频检测方法仍面临挑战。FAD数据集现已公开。