The mismatch between close-set training and open-set testing usually leads to significant performance degradation for speaker verification task. For existing loss functions, metric learning-based objectives depend strongly on searching effective pairs which might hinder further improvements. And popular multi-classification methods are usually observed with degradation when evaluated on unseen speakers. In this work, we introduce SphereFace2 framework which uses several binary classifiers to train the speaker model in a pair-wise manner instead of performing multi-classification. Benefiting from this learning paradigm, it can efficiently alleviate the gap between training and evaluation. Experiments conducted on Voxceleb show that the SphereFace2 outperforms other existing loss functions, especially on hard trials. Besides, large margin fine-tuning strategy is proven to be compatible with it for further improvements. Finally, SphereFace2 also shows its strong robustness to class-wise noisy labels which has the potential to be applied in the semi-supervised training scenario with inaccurate estimated pseudo labels. Codes are available in https://github.com/Hunterhuan/sphereface2_speaker_verification
翻译:闭集训练与开集测试之间的不匹配通常会导致说话人验证任务的性能显著下降。对于现有损失函数,基于度量学习的目标严重依赖于有效对的搜索,这可能阻碍进一步改进。而流行的多分类方法在针对未见说话人进行评估时,通常会出现性能退化。本研究引入SphereFace2框架,该框架使用多个二分类器以成对方式训练说话人模型,而非执行多分类。受益于这种学习范式,它可以有效缓解训练与评估之间的差距。在Voxceleb上进行的实验表明,SphereFace2优于其他现有损失函数,尤其在困难测试对中表现突出。此外,大间隔微调策略被证明可与该框架兼容以进一步改进。最后,SphereFace2还展示了对类别噪声标签的强大鲁棒性,这使其具有在包含不准确伪标签的半监督训练场景中应用的潜力。代码已在https://github.com/Hunterhuan/sphereface2_speaker_verification 公开。