We address speaker-aware anti-spoofing, where prior knowledge of the target speaker is incorporated into a voice spoofing countermeasure (CM). In contrast to the frequently used speaker-independent solutions, we train the CM in a speaker-conditioned way. As a proof of concept, we consider speaker-aware extension to the state-of-the-art AASIST (audio anti-spoofing using integrated spectro-temporal graph attention networks) model. To this end, we consider two alternative strategies to incorporate target speaker information at the frame and utterance levels, respectively. The experimental results on a custom protocol based on ASVspoof 2019 dataset indicates the efficiency of the speaker information via enrollment: we obtain maximum relative improvements of 25.1% and 11.6% in equal error rate (EER) and minimum tandem detection cost function (t-DCF) over a speaker-independent baseline, respectively.
翻译:我们探讨说话人感知反欺骗技术,即在语音欺骗对抗措施(CM)中融入目标说话人的先验知识。与常用的说话人无关解决方案不同,我们采用说话人条件化的方式训练CM。作为概念验证,我们考虑了针对当前最优的AASIST(基于集成频谱-时域图注意力网络的音频反欺骗)模型的说话人感知扩展。为此,我们分别提出了两种备选策略,在帧级和话语级融入目标说话人信息。基于ASVspoof 2019数据集定制的协议进行的实验表明,通过注册引入说话人信息具有高效性:与说话人无关基线相比,我们在等错误率(EER)和最小串联检测代价函数(t-DCF)上分别获得了最高25.1%和11.6%的相对改进。