Replay attack is one of the most effective and simplest voice spoofing attacks. Detecting replay attacks is challenging, according to the Automatic Speaker Verification Spoofing and Countermeasures Challenge 2021 (ASVspoof 2021), because they involve a loudspeaker, a microphone, and acoustic conditions (e.g., background noise). One obstacle to detecting replay attacks is finding robust feature representations that reflect the channel noise information added to the replayed speech. This study proposes a feature extraction approach that uses audio compression for assistance. Audio compression compresses audio to preserve content and speaker information for transmission. The missed information after decompression is expected to contain content- and speaker-independent information (e.g., channel noise added during the replay process). We conducted a comprehensive experiment with a few data augmentation techniques and 3 classifiers on the ASVspoof 2021 physical access (PA) set and confirmed the effectiveness of the proposed feature extraction approach. To the best of our knowledge, the proposed approach achieves the lowest EER at 22.71% on the ASVspoof 2021 PA evaluation set.
翻译:重放攻击是最有效且最简单的语音欺骗攻击之一。根据自动说话人验证欺骗与反制挑战赛2021(ASVspoof 2021),重放攻击的检测具有挑战性,因为它涉及扬声器、麦克风以及声学条件(例如背景噪声)。检测重放攻击的一个障碍在于,需要找到能够反映重放语音中信道噪声信息的鲁棒特征表示。本研究提出一种利用音频压缩辅助的特征提取方法。音频压缩通过压缩音频来保留传输所需的内容和说话人信息。解压缩后丢失的信息预计包含与内容和说话人无关的信息(例如重放过程中引入的信道噪声)。我们在ASVspoof 2021物理访问(PA)数据集上,结合少量数据增强技术和3种分类器进行了全面实验,验证了所提特征提取方法的有效性。据我们所知,该方法在ASVspoof 2021 PA评估集上实现了22.71%的最低等错误率(EER)。