Finding synthetic artifacts of spoofing data will help the anti-spoofing countermeasures (CMs) system discriminate between spoofed and real speech. The Conformer combines the best of convolutional neural network and the Transformer, allowing it to aggregate global and local information. This may benefit the CM system to capture the synthetic artifacts hidden both locally and globally. In this paper, we present the transfer learning based MFA-Conformer structure for CM systems. By pre-training the Conformer encoder with different tasks, the robustness of the CM system is enhanced. The proposed method is evaluated on both Chinese and English spoofing detection databases. In the FAD clean set, proposed method achieves an EER of 0.04%, which dramatically outperforms the baseline. Our system is also comparable to the pre-training methods base on Wav2Vec 2.0. Moreover, we also provide a detailed analysis of the robustness of different models.
翻译:寻找伪造数据中的合成痕迹有助于防欺骗对抗措施(CM)系统区分伪造语音与真实语音。Conformer融合了卷积神经网络与Transformer的优势,能够聚合全局与局部信息。这有助于CM系统捕获隐藏在局部和全局的合成痕迹。本文提出基于迁移学习的MFA-Conformer结构用于CM系统。通过在不同任务上预训练Conformer编码器,增强了CM系统的鲁棒性。所提方法在中文和英文欺骗检测数据库上进行了评估。在FAD纯净集中,所提方法实现了0.04%的等错误率(EER),显著优于基线系统。我们的系统性能也与基于Wav2Vec 2.0的预训练方法相当。此外,我们还对不同模型的鲁棒性进行了详细分析。