A good supervised embedding for a specific machine learning task is only sensitive to changes in the label of interest and is invariant to other confounding factors. We leverage the concept of repeatability from measurement theory to describe this property and propose to use the intra-class correlation coefficient (ICC) to evaluate the repeatability of embeddings. We then propose a novel regularizer, the ICC regularizer, as a complementary component for contrastive losses to guide deep neural networks to produce embeddings with higher repeatability. We use simulated data to explain why the ICC regularizer works better on minimizing the intra-class variance than the contrastive loss alone. We implement the ICC regularizer and apply it to three speech tasks: speaker verification, voice style conversion, and a clinical application for detecting dysphonic voice. The experimental results demonstrate that adding an ICC regularizer can improve the repeatability of learned embeddings compared to only using the contrastive loss; further, these embeddings lead to improved performance in these downstream tasks.
翻译:一个好的监督嵌入应仅对特定机器学习任务中感兴趣的标签变化敏感,而对其他混杂因素保持不变性。我们借鉴测量理论中的可重复性概念描述这一特性,并提出使用类内相关系数(ICC)评估嵌入的可重复性。进而提出一种新型正则化器——ICC正则化器,作为对比损失的补充组件,引导深度神经网络生成具有更高可重复性的嵌入。通过仿真数据验证了ICC正则化器在最小化类内方差方面优于单独使用对比损失的原因。我们将ICC正则化器应用于三项语音任务:说话人验证、语音风格转换以及检测嘶哑语音的临床应用中。实验结果表明,与仅使用对比损失相比,添加ICC正则化器可提升学习嵌入的可重复性;此外,这些嵌入进一步提升了下游任务的性能表现。