This paper presents a novel study of parameter-free attentive scoring for speaker verification. Parameter-free scoring provides the flexibility of comparing speaker representations without the need of an accompanying parametric scoring model. Inspired by the attention component in Transformer neural networks, we propose a variant of the scaled dot product attention mechanism to compare enrollment and test segment representations. In addition, this work explores the effect on performance of (i) different types of normalization, (ii) independent versus tied query/key estimation, (iii) varying the number of key-value pairs and (iv) pooling multiple enrollment utterance statistics. Experimental results for a 4 task average show that a simple parameter-free attentive scoring mechanism can improve the average EER by 10% over the best cosine similarity baseline.
翻译:本文提出了一项关于无参数注意力评分用于说话人确认的创新研究。无参数评分提供了比较说话人表征的灵活性,而无需依赖配套的参数化评分模型。受Transformer神经网络中注意力组件的启发,我们提出了一种缩放点积注意力机制的变体,用于比较注册语音段与测试语音段的表征。此外,本研究探索了以下因素对性能的影响:(i)不同类型的归一化方法,(ii)独立查询/键估计与绑定查询/键估计,(iii)键值对数量的变化,以及(iv)池化多个注册语句的统计量。在4个任务平均上的实验结果表明,一种简单的无参数注意力评分机制相比最优的余弦相似度基线,可将平均等错误率提升10%。