As speech synthesis systems continue to make remarkable advances in recent years, the importance of robust deepfake detection systems that perform well in unseen systems has grown. In this paper, we propose a novel adaptive centroid shift (ACS) method that updates the centroid representation by continually shifting as the weighted average of bonafide representations. Our approach uses only bonafide samples to define their centroid, which can yield a specialized centroid for one-class learning. Integrating our ACS with one-class learning gathers bonafide representations into a single cluster, forming well-separated embeddings robust to unseen spoofing attacks. Our proposed method achieves an equal error rate (EER) of 2.19% on the ASVspoof 2021 deepfake dataset, outperforming all existing systems. Furthermore, the t-SNE visualization illustrates that our method effectively maps the bonafide embeddings into a single cluster and successfully disentangles the bonafide and spoof classes.
翻译:随着语音合成系统近年来持续取得显著进展,在未知系统上表现鲁棒的深度伪造检测系统的重要性日益凸显。本文提出一种新颖的自适应质心偏移(ACS)方法,该方法通过持续偏移将质心表征更新为真实样本表征的加权平均值。我们的方法仅使用真实样本来定义其质心,从而可为单类学习生成专用质心。将ACS与单类学习相结合,可将真实表征聚集到单一簇中,形成对未知伪造攻击具有鲁棒性的、分离良好的嵌入表示。我们提出的方法在ASVspoof 2021深度伪造数据集上实现了2.19%的等错误率(EER),优于所有现有系统。此外,t-SNE可视化结果表明,我们的方法能有效将真实嵌入映射到单一簇中,并成功分离真实类别与伪造类别。