The performance of speaker verification (SV) models may drop dramatically in noisy environments. A speech enhancement (SE) module can be used as a front-end strategy. However, existing SE methods may fail to bring performance improvements to downstream SV systems due to artifacts in the predicted signals of SE models. To compensate for artifacts, we propose a generic denoising framework named LC4SV, which can serve as a pre-processor for various unknown downstream SV models. In LC4SV, we employ a learning-based interpolation agent to automatically generate the appropriate coefficients between the enhanced signal and its noisy input to improve SV performance in noisy environments. Our experimental results demonstrate that LC4SV consistently improves the performance of various unseen SV systems. To the best of our knowledge, this work is the first attempt to develop a learning-based interpolation scheme aiming at improving SV performance in noisy environments.
翻译:摘要:说话人验证(SV)模型在噪声环境下的性能可能急剧下降。语音增强(SE)模块可作为前端策略使用。然而,现有SE方法因预测信号中的伪影问题,可能无法为下游SV系统带来性能提升。为补偿伪影影响,我们提出了一种名为LC4SV的通用去噪框架,该框架可作为多种未知下游SV模型的预处理器。在LC4SV中,我们采用基于学习的插值代理,自动生成增强信号与其带噪输入之间的适定系数,从而提升噪声环境下的SV性能。实验结果表明,LC4SV能够持续提升多种未见过SV系统的性能。据我们所知,本研究首次尝试开发旨在提升噪声环境下SV性能的基于学习的插值方案。