DNN-based speaker verification (SV) models demonstrate significant performance at relatively high computation costs. Model compression can be applied to reduce the model size for lower resource consumption. The present study exploits weight quantization to compress two widely-used SV models, namely ECAPA-TDNN and ResNet. Experimental results on VoxCeleb show that weight quantization is effective for compressing SV models. The model size can be reduced multiple times without noticeable degradation in performance. Compression of ResNet shows more robust results than ECAPA-TDNN with lower-bitwidth quantization. Analysis of the layer weights suggests that the smooth weight distribution of ResNet may be related to its better robustness. The generalization ability of the quantized model is validated via a language-mismatched SV task. Furthermore, analysis by information probing reveals that the quantized models can retain most of the speaker-relevant knowledge learned by the original models.
翻译:基于深度神经网络(DNN)的说话人确认(SV)模型在较高计算成本下展现出显著性能。模型压缩技术可降低模型规模以减少资源消耗。本研究利用权重量化压缩两种广泛使用的SV模型,即ECAPA-TDNN和ResNet。在VoxCeleb上的实验结果表明,权重量化能有效压缩SV模型,模型规模可缩减数倍而无明显性能下降。ResNet在低位宽量化下的压缩表现优于ECAPA-TDNN。对层权重的分析表明,ResNet平滑的权重分布可能与其更好的鲁棒性有关。通过语言不匹配的SV任务验证了量化模型的泛化能力。此外,信息探针分析揭示,量化模型能够保留原始模型学习到的大部分说话人相关知识。