Robust speaker verification under noisy conditions remains an open challenge. Conventional deep learning methods learn a robust unified speaker representation space against diverse background noise and achieve significant improvement. In contrast, this paper presents a noise-conditioned mixture-ofexperts framework that decomposes the feature space into specialized noise-aware subspaces for speaker verification. Specifically, we propose a noise-conditioned expert routing mechanism, a universal model based expert specialization strategy, and an SNR-decaying curriculum learning protocol, collectively improving model robustness and generalization under diverse noise conditions. The proposed method can automatically route inputs to expert networks based on noise information derived from the inputs, where each expert targets distinct noise characteristics while preserving speaker identity information. Comprehensive experiments demonstrate consistent superiority over baselines, confirming that explicit noise-dependent feature modeling significantly enhances robustness without sacrificing verification accuracy.
翻译:在噪声条件下的鲁棒说话人验证仍然是一个开放挑战。传统的深度学习方法学习一个针对多样化背景噪声的鲁棒统一说话人表征空间,并取得了显著改进。相比之下,本文提出了一种噪声条件化专家混合框架,该框架将特征空间分解为专门用于说话人验证的、具有噪声感知能力的子空间。具体而言,我们提出了一个噪声条件化的专家路由机制、一个基于通用模型的专家专业化策略以及一个信噪比递减的课程学习协议,共同提升了模型在多样化噪声条件下的鲁棒性和泛化能力。所提出的方法能够基于从输入中推导出的噪声信息,自动将输入路由到专家网络,其中每个专家针对不同的噪声特性,同时保留说话人身份信息。全面的实验证明了该方法相对于基线模型具有一致的优势,证实了显式的噪声依赖性特征建模能在不牺牲验证准确性的情况下显著增强鲁棒性。