Background noise reduces speech intelligibility and quality, making speaker verification (SV) in noisy environments a challenging task. To improve the noise robustness of SV systems, additive noise data augmentation method has been commonly used. In this paper, we propose a new additive noise method, partial additive speech (PAS), which aims to train SV systems to be less affected by noisy environments. The experimental results demonstrate that PAS outperforms traditional additive noise in terms of equal error rates (EER), with relative improvements of 4.64% and 5.01% observed in SE-ResNet34 and ECAPA-TDNN. We also show the effectiveness of proposed method by analyzing attention modules and visualizing speaker embeddings.
翻译:背景噪声会降低语音清晰度和质量,使得噪声环境下的说话人验证(SV)成为一项具有挑战性的任务。为提升SV系统的噪声鲁棒性,加性噪声数据增强方法已被广泛采用。本文提出一种新型加性噪声方法——局部加性语音(PAS),旨在训练SV系统以降低噪声环境对其性能的影响。实验结果表明,在等错误率(EER)指标上,PAS优于传统加性噪声方法,在SE-ResNet34和ECAPA-TDNN上分别实现了4.64%和5.01%的相对性能提升。此外,通过分析注意力模块和可视化说话人嵌入特征,进一步验证了所提方法的有效性。