DNN-based models achieve significant performance in the speaker verification (SV) task with substantial computation costs. Model compression can be applied to reduce the model size for lower resource consumption. The present study exploits weight quantization to compress two widely-used SV models, ECAPA-TDNN and ResNet. The experiments on VoxCeleb indicate that quantization is effective for compressing SV models, where the model size can be reduced by multiple times with no noticeable performance decline. ResNet achieves more robust results than ECAPA-TDNN using lower-bitwidth quantization. The analysis of layer weights shows that the smooth distribution of ResNet may contribute to its robust results. The additional experiments on CN-Celeb validate the quantized model's generalization ability in the language mismatch scenario. Furthermore, information probing results demonstrate that the quantized models can preserve most of the learned speaker-relevant knowledge compared to the original models.
翻译:基于深度神经网络(DNN)的模型在说话人确认(SV)任务中取得了显著性能,但需耗费大量计算资源。模型压缩技术可有效缩减模型规模以降低资源消耗。本研究采用权重量化方法对两种广泛使用的说话人确认模型——ECAPA-TDNN和ResNet进行压缩。在VoxCeleb数据集上的实验表明,量化可有效压缩说话人确认模型,在无明显性能损失的情况下将模型大小缩减数倍。采用低位宽量化时,ResNet比ECAPA-TDNN展现出更稳健的结果。层权重分析显示,ResNet的平滑权重分布可能有助于其稳健表现。在CN-Celeb上的附加实验验证了量化模型在语言不匹配场景下的泛化能力。此外,信息探测结果表明,与原始模型相比,量化模型能够保留大部分与说话人相关的学习知识。