Audio anti-spoofing for automatic speaker verification aims to safeguard users' identities from spoofing attacks. Although state-of-the-art spoofing countermeasure(CM) models perform well on specific datasets, they lack generalization when evaluated with different datasets. To address this limitation, previous studies have explored large pre-trained models, which require significant resources and time. We aim to develop a compact but well-generalizing CM model that can compete with large pre-trained models. Our approach involves multi-dataset co-training and sharpness-aware minimization, which has not been investigated in this domain. Extensive experiments reveal that proposed method yield competitive results across various datasets while utilizing 4,000 times less parameters than the large pre-trained models.
翻译:音频反欺骗技术旨在保护自动说话人验证系统中用户的身份免受欺骗攻击。尽管最先进的欺骗对抗模型在特定数据集上表现良好,但在使用不同数据集进行评估时缺乏泛化能力。为解决这一局限,以往研究探索了需要大量资源和时间的大规模预训练模型。我们致力于开发一种紧凑且具有良好泛化能力的欺骗对抗模型,能够与大规模预训练模型相竞争。我们的方法涉及多数据集协同训练和锐度感知最小化技术,该技术在该领域尚未被研究。大量实验表明,所提方法在多个数据集上取得了具有竞争力的结果,同时其参数量仅为大规模预训练模型的1/4000。