In speech deepfake detection, one of the critical aspects is developing detectors able to generalize on unseen data and distinguish fake signals across different datasets. Common approaches to this challenge involve incorporating diverse data into the training process or fine-tuning models on unseen datasets. However, these solutions can be computationally demanding and may lead to the loss of knowledge acquired from previously learned data. Continual learning techniques offer a potential solution to this problem, allowing the models to learn from unseen data without losing what they have already learned. Still, the optimal way to apply these algorithms for speech deepfake detection remains unclear, and we do not know which is the best way to apply these algorithms to the developed models. In this paper we address this aspect and investigate whether, when retraining a speech deepfake detector, it is more effective to apply continual learning across the entire model or to update only some of its layers while freezing others. Our findings, validated across multiple models, indicate that the most effective approach among the analyzed ones is to update only the weights of the initial layers, which are responsible for processing the input features of the detector.
翻译:在语音深度伪造检测中,关键挑战之一是开发能够泛化至未见数据、并区分不同数据集中伪造信号的检测器。应对此挑战的常见方法包括将多样化数据纳入训练过程,或对未见数据集进行模型微调。然而,这些方案通常计算成本高昂,并可能导致从先前已学习数据中获取的知识丢失。持续学习技术为此问题提供了潜在解决方案,使模型能够在不遗忘已学知识的前提下从新数据中学习。尽管如此,如何将这些算法最优地应用于语音深度伪造检测仍不明确,且我们尚不清楚将这些算法应用于已开发模型的最佳方式。本文针对这一方面展开研究,探讨在重新训练语音深度伪造检测器时,是在整个模型中应用持续学习更有效,还是仅更新部分层而冻结其他层更为高效。我们在多个模型上验证的研究结果表明,在分析的方法中,最有效的策略是仅更新负责处理检测器输入特征的初始层权重。