Recent text-to-speech (TTS) developments have made voice cloning (VC) more realistic, affordable, and easily accessible. This has given rise to many potential abuses of this technology, including Joe Biden's New Hampshire deepfake robocall. Several methodologies have been proposed to detect such clones. However, these methodologies have been trained and evaluated on relatively clean databases. Recently, ASVspoof 5 Challenge introduced a new crowd-sourced database of diverse acoustic conditions including various spoofing attacks and codec conditions. This paper is our submission to the ASVspoof 5 Challenge and aims to investigate the performance of Audio Spoof Detection, trained using data augmentation through laundering attacks, on the ASVSpoof 5 database. The results demonstrate that our system performs worst on A18, A19, A20, A26, and A30 spoofing attacks and in the codec and compression conditions of C08, C09, and C10.
翻译:近年来,文本转语音(TTS)技术的发展使得语音克隆(VC)更加逼真、经济且易于获取。这导致该技术存在许多潜在的滥用风险,例如乔·拜登的新罕布什尔州深度伪造自动语音电话事件。目前已有多种方法被提出用于检测此类克隆语音。然而,这些方法均在相对纯净的数据库上进行训练和评估。近期,ASVspoof 5挑战赛引入了一个新的众包数据库,其中包含了多样化的声学条件,包括各种欺骗攻击和编解码条件。本文是我们向ASVspoof 5挑战赛提交的成果,旨在研究通过清洗攻击进行数据增强训练的音频欺骗检测模型在ASVSpoof 5数据库上的性能。结果表明,我们的系统在A18、A19、A20、A26和A30欺骗攻击以及C08、C09和C10编解码与压缩条件下表现最差。