The rapid evolution of speech synthesis and voice conversion has raised substantial concerns due to the potential misuse of such technology, prompting a pressing need for effective audio deepfake detection mechanisms. Existing detection models have shown remarkable success in discriminating known deepfake audio, but struggle when encountering new attack types. To address this challenge, one of the emergent effective approaches is continual learning. In this paper, we propose a continual learning approach called Radian Weight Modification (RWM) for audio deepfake detection. The fundamental concept underlying RWM involves categorizing all classes into two groups: those with compact feature distributions across tasks, such as genuine audio, and those with more spread-out distributions, like various types of fake audio. These distinctions are quantified by means of the in-class cosine distance, which subsequently serves as the basis for RWM to introduce a trainable gradient modification direction for distinct data types. Experimental evaluations against mainstream continual learning methods reveal the superiority of RWM in terms of knowledge acquisition and mitigating forgetting in audio deepfake detection. Furthermore, RWM's applicability extends beyond audio deepfake detection, demonstrating its potential significance in diverse machine learning domains such as image recognition.
翻译:语音合成与语音转换技术的快速发展因其潜在的滥用风险引发了重大关切,迫切需要有效的音频深度伪造检测机制。现有检测模型在识别已知深度伪造音频方面已取得显著成功,但在应对新型攻击时仍存在困难。针对这一挑战,持续学习成为新兴的有效方法之一。本文提出一种名为径向权重修正(RWM)的持续学习方法用于音频深度伪造检测。RWM的核心思想是将所有类别划分为两类:一类是跨任务具有紧凑特征分布的类型(如真实音频),另一类是分布较为分散的类型(如各类伪造音频)。通过类内余弦距离量化这些差异,并以此为基础为不同数据类型引入可训练的梯度修正方向。与主流持续学习方法的实验评估表明,RWM在音频深度伪造检测的知识获取与遗忘抑制方面具有优越性。此外,RWM的应用范围不限于音频深度伪造检测,其在图像识别等多样化机器学习领域亦展现出潜在价值。