This study aims to develop a single integrated spoofing-aware speaker verification (SASV) embeddings that satisfy two aspects. First, rejecting non-target speakers' input as well as target speakers' spoofed inputs should be addressed. Second, competitive performance should be demonstrated compared to the fusion of automatic speaker verification (ASV) and countermeasure (CM) embeddings, which outperformed single embedding solutions by a large margin in the SASV2022 challenge. We analyze that the inferior performance of single SASV embeddings comes from insufficient amount of training data and distinct nature of ASV and CM tasks. To this end, we propose a novel framework that includes multi-stage training and a combination of loss functions. Copy synthesis, combined with several vocoders, is also exploited to address the lack of spoofed data. Experimental results show dramatic improvements, achieving a SASV-EER of 1.06% on the evaluation protocol of the SASV2022 challenge.
翻译:本研究旨在开发一种单一集成的防欺骗说话人验证(SASV)嵌入,需满足两个方面的要求。首先,应能同时拒绝非目标说话人的输入以及目标说话人的欺骗输入。其次,与自动说话人验证(ASV)和反制措施(CM)嵌入的融合方案相比,需展现出具有竞争力的性能;而该融合方案在SASV2022挑战中大幅优于单一嵌入解决方案。我们分析认为,单一SASV嵌入性能较差的原因在于训练数据量不足,以及ASV与CM任务性质上的显著差异。为此,我们提出一种新颖框架,包含多阶段训练及多种损失函数的组合。同时,利用多种声码器进行复制合成,以解决欺骗数据不足的问题。实验结果表明,该方法取得了显著改进,在SASV2022挑战的评估协议上实现了1.06%的SASV等错误率。