Recent research in speaker verification has increasingly focused on achieving robust and reliable recognition under challenging channel conditions and noisy environments. Identifying speakers in radio communications is particularly difficult due to inherent limitations such as constrained bandwidth and pervasive noise interference. To address this issue, we present a Channel Robust Speaker Learning (CRSL) framework that enhances the robustness of the current speaker verification pipeline, considering data source, data augmentation, and the efficiency of model transfer processes. Our framework introduces an augmentation module that mitigates bandwidth variations in radio speech datasets by manipulating the bandwidth of training inputs. It also addresses unknown noise by introducing noise within the manifold space. Additionally, we propose an efficient fine-tuning method that reduces the need for extensive additional training time and large amounts of data. Moreover, we develop a toolkit for assembling a large-scale radio speech corpus and establish a benchmark specifically tailored for radio scenario speaker verification studies. Experimental results demonstrate that our proposed methodology effectively enhances performance and mitigates degradation caused by radio transmission in speaker verification tasks. The code will be available on Github.
翻译:近年来,说话人验证的研究日益关注在具有挑战性的信道条件和噪声环境下实现鲁棒且可靠的识别。由于带宽受限和普遍存在的噪声干扰等固有局限,在无线电通信中识别说话人尤为困难。为解决此问题,我们提出了一个信道鲁棒说话人学习框架,该框架从数据源、数据增强和模型迁移过程效率等方面,增强了当前说话人验证流程的鲁棒性。我们的框架引入了一个增强模块,通过操纵训练输入的带宽来缓解无线电语音数据集中的带宽变化。它还通过在流形空间中引入噪声来处理未知噪声。此外,我们提出了一种高效的微调方法,减少了对大量额外训练时间和海量数据的需求。同时,我们开发了一个用于构建大规模无线电语音语料库的工具包,并建立了一个专门针对无线电场景说话人验证研究的基准。实验结果表明,我们提出的方法有效提升了性能,并缓解了说话人验证任务中因无线电传输导致的性能下降。代码将在Github上提供。