Background noise considerably reduces the accuracy and reliability of speaker verification (SV) systems. These challenges can be addressed using a speech enhancement system as a front-end module. Recently, diffusion probabilistic models (DPMs) have exhibited remarkable noise-compensation capabilities in the speech enhancement domain. Building on this success, we propose Diff-SV, a noise-robust SV framework that leverages DPM. Diff-SV unifies a DPM-based speech enhancement system with a speaker embedding extractor, and yields a discriminative and noise-tolerable speaker representation through a hierarchical structure. The proposed model was evaluated under both in-domain and out-of-domain noisy conditions using the VoxCeleb1 test set, an external noise source, and the VOiCES corpus. The obtained experimental results demonstrate that Diff-SV achieves state-of-the-art performance, outperforming recently proposed noise-robust SV systems.
翻译:摘要:背景噪声会显著降低说话人验证(SV)系统的准确性与可靠性。此类挑战可通过采用语音增强系统作为前端模块加以解决。近年来,扩散概率模型(DPMs)在语音增强领域展现出卓越的噪声补偿能力。基于此成果,我们提出Diff-SV——一种利用DPM的噪声鲁棒SV框架。Diff-SV将基于DPM的语音增强系统与说话人嵌入提取器相统一,并通过分层结构生成具有判别性与噪声容忍性的说话人表征。该模型在VoxCeleb1测试集、外部噪声源及VOiCES语料库上针对域内与域外噪声条件进行了评估。实验结果表明,Diff-SV达到了当前最优性能,优于近期提出的噪声鲁棒SV系统。