The deep learning models used for speaker verification rely heavily on large amounts of data and correct labeling. However, noisy (incorrect) labels often occur, which degrades the performance of the system. In this paper, we propose a novel two-stage learning method to filter out noisy labels from speaker datasets. Since a DNN will first fit data with clean labels, we first train the model with all data for several epochs. Then, based on this model, the model predictions are compared with the labels using our proposed the OR-Gate with top-k mechanism to select the data with clean labels and the selected data is used to train the model. This process is iterated until the training is completed. We have demonstrated the effectiveness of this method in filtering noisy labels through extensive experiments and have achieved excellent performance on the VoxCeleb (1 and 2) with different added noise rates.
翻译:用于说话人验证的深度学习模型高度依赖大量数据和正确标注。然而,噪声(错误)标签时常出现,这会降低系统性能。本文提出了一种新颖的两阶段学习方法,用于从说话人数据集中过滤噪声标签。由于深度神经网络(DNN)会优先拟合带有干净标签的数据,我们首先使用所有数据对模型进行数个周期的训练。随后,基于该训练模型,我们利用所提出的带有top-k机制的或门(OR-Gate)将模型预测结果与标签进行比较,以筛选出带有干净标签的数据,并使用这些筛选出的数据重新训练模型。这一过程迭代进行直至训练完成。通过大量实验,我们证明了该方法在过滤噪声标签方面的有效性,并在不同噪声率添加的VoxCeleb(1和2)数据集上取得了优异的性能。