A new loss function for speaker recognition with deep neural network is proposed, based on Jeffreys Divergence. Adding this divergence to the cross-entropy loss function allows to maximize the target value of the output distribution while smoothing the non-target values. This objective function provides highly discriminative features. Beyond this effect, we propose a theoretical justification of its effectiveness and try to understand how this loss function affects the model, in particular the impact on dataset types (i.e. in-domain or out-of-domain w.r.t the training corpus). Our experiments show that Jeffreys loss consistently outperforms the state-of-the-art for speaker recognition, especially on out-of-domain data, and helps limit false alarms.
翻译:提出了一种基于Jeffreys散度的深度神经网络说话人识别损失函数。将该散度加入交叉熵损失函数中,可在平滑非目标输出值的同时最大化输出分布的目标值。该目标函数能够提供高判别性特征。除这一效果外,我们对其有效性提出了理论依据,并试图理解该损失函数如何影响模型,特别是对数据集类型(即相对于训练语料的域内或域外数据)的影响。实验表明,Jeffreys损失在说话人识别任务中始终优于现有最优方法,尤其在域外数据上表现突出,并能有效减少误报。