The development of deep neural networks (DNN) has significantly enhanced the performance of speaker verification (SV) systems in recent years. However, a critical issue that persists when applying DNN-based SV systems in practical applications is domain mismatch. To mitigate the performance degradation caused by the mismatch, domain adaptation becomes necessary. This paper introduces an approach to adapt DNN-based SV models by manipulating the learnable model inputs, inspired by the concept of adversarial reprogramming. The pre-trained SV model remains fixed and functions solely in the forward process, resembling a black-box model. A lightweight network is utilized to estimate the gradients for the learnable parameters at the input, which bypasses the gradient backpropagation through the black-box model. The reprogrammed output is processed by a two-layer backend learning module as the final adapted speaker embedding. The number of parameters involved in the gradient calculation is small in our design. With few additional parameters, the proposed method achieves both memory and parameter efficiency. The experiments are conducted in language mismatch scenarios. Using much less computation cost, the proposed method obtains close or superior performance to the fully finetuned models in our experiments, which demonstrates its effectiveness.
翻译:深度神经网络的发展近年来显著提升了说话人验证系统的性能。然而,将基于深度神经网络的说话人验证系统应用于实际场景时,一个持续存在的关键问题是域不匹配。为缓解域不匹配导致的性能下降,域自适应成为必要。本文受对抗重编程概念的启发,提出通过操控可学习的模型输入来对基于深度神经网络的说话人验证模型进行自适应。预训练说话人验证模型保持固定,仅执行前向过程,类似于黑盒模型。利用轻量级网络估计输入层可学习参数的梯度,从而绕过黑盒模型的梯度反向传播。经重编程后的输出通过两层后端学习模块处理,生成最终的自适应说话人嵌入。本设计中,参与梯度计算的参数量较少。该方法仅需少量额外参数即可实现存储效率与参数效率的双重提升。我们在语言不匹配场景下开展实验。实验结果表明,在显著降低计算成本的同时,所提方法可获得与完全微调模型相当或更优的性能,验证了其有效性。