Current research in synthesized speech detection primarily focuses on the generalization of detection systems to unknown spoofing methods of noise-free speech. However, the performance of anti-spoofing countermeasures (CM) system is often don't work as well in more challenging scenarios, such as those involving noise and reverberation. To address the problem of enhancing the robustness of CM systems, we propose a transfer learning-based speech enhancement front-end joint optimization (TL-SEJ) method, investigating its effectiveness in improving robustness against noise and reverberation. We evaluated the proposed method's performance through a series of comparative and ablation experiments. The experimental results show that, across different signal-to-noise ratio test conditions, the proposed TL-SEJ method improves recognition accuracy by 2.7% to 15.8% compared to the baseline. Compared to conventional data augmentation methods, our system achieves an accuracy improvement ranging from 0.7% to 5.8% in various noisy conditions and from 1.7% to 2.8% under different RT60 reverberation scenarios. These experiments demonstrate that the proposed method effectively enhances system robustness in noisy and reverberant conditions.
翻译:当前合成语音检测的研究主要集中于检测系统对未知无噪声语音欺骗方法的泛化能力。然而,反欺骗对抗措施系统的性能在更具挑战性的场景中往往表现不佳,例如涉及噪声和混响的场景。为了解决增强CM系统鲁棒性的问题,我们提出了一种基于迁移学习的语音增强前端联合优化方法,并研究了其在提升抗噪声和抗混响鲁棒性方面的有效性。我们通过一系列对比实验和消融实验评估了所提方法的性能。实验结果表明,在不同的信噪比测试条件下,所提出的TL-SEJ方法相较于基线系统,识别准确率提升了2.7%至15.8%。与传统数据增强方法相比,我们的系统在各种噪声条件下的准确率提升了0.7%至5.8%,在不同RT60混响场景下提升了1.7%至2.8%。这些实验证明,所提方法能有效增强系统在噪声和混响条件下的鲁棒性。