Knowledge distillation (KD) is used to enhance automatic speaker verification performance by ensuring consistency between large teacher networks and lightweight student networks at the embedding level or label level. However, the conventional label-level KD overlooks the significant knowledge from non-target speakers, particularly their classification probabilities, which can be crucial for automatic speaker verification. In this paper, we first demonstrate that leveraging a larger number of training non-target speakers improves the performance of automatic speaker verification models. Inspired by this finding about the importance of non-target speakers' knowledge, we modified the conventional label-level KD by disentangling and emphasizing the classification probabilities of non-target speakers during knowledge distillation. The proposed method is applied to three different student model architectures and achieves an average of 13.67% improvement in EER on the VoxCeleb dataset compared to embedding-level and conventional label-level KD methods.
翻译:知识蒸馏(KD)通过确保大型教师网络与轻量级学生网络在嵌入层或标签层的一致性,用于提升自动说话人验证性能。然而,传统标签层KD忽视了非目标说话人的重要知识,特别是其分类概率,该信息对自动说话人验证可能至关重要。本文首先证明利用更多训练集中的非目标说话人可有效提升自动说话人验证模型性能。受此关于非目标说话人知识重要性的发现启发,我们通过解耦并强调知识蒸馏过程中非目标说话人的分类概率,对传统标签层KD进行了改进。所提方法应用于三种不同的学生模型架构,在VoxCeleb数据集上相比嵌入层和传统标签层KD方法,平均等错误率(EER)降低13.67%。