Spoofing detection systems are typically trained using diverse recordings from multiple speakers, often assuming that the resulting embeddings are independent of speaker identity. However, this assumption remains unverified. In this paper, we investigate the impact of speaker information on spoofing detection systems. We propose two approaches within our Speaker-Invariant Multi-Task framework, one that models speaker identity within the embeddings and another that removes it. SInMT integrates multi-task learning for joint speaker recognition and spoofing detection, incorporating a gradient reversal layer. Evaluated using four datasets, our speaker-invariant model reduces the average equal error rate by 17% compared to the baseline, with up to 48% reduction for the most challenging attacks (e.g., A11).
翻译:欺骗检测系统通常使用来自多个说话人的多样化录音进行训练,通常假设生成的嵌入向量独立于说话人身份。然而,这一假设尚未得到验证。本文研究了说话人信息对欺骗检测系统的影响。我们在说话人不变多任务框架内提出了两种方法:一种在嵌入向量中建模说话人身份,另一种则将其移除。SInMT 通过集成梯度反转层,实现了说话人识别与欺骗检测的联合多任务学习。在四个数据集上的评估表明,与基线相比,我们的说话人不变模型将平均等错误率降低了 17%,对于最具挑战性的攻击(例如 A11),降低幅度高达 48%。