Speech deepfakes pose a significant threat to personal security and content authenticity. Several detectors have been proposed in the literature, and one of the primary challenges these systems have to face is the generalization over unseen data to identify fake signals across a wide range of datasets. In this paper, we introduce a novel approach for enhancing speech deepfake detection performance using a Mixture of Experts architecture. The Mixture of Experts framework is well-suited for the speech deepfake detection task due to its ability to specialize in different input types and handle data variability efficiently. This approach offers superior generalization and adaptability to unseen data compared to traditional single models or ensemble methods. Additionally, its modular structure supports scalable updates, making it more flexible in managing the evolving complexity of deepfake techniques while maintaining high detection accuracy. We propose an efficient, lightweight gating mechanism to dynamically assign expert weights for each input, optimizing detection performance. Experimental results across multiple datasets demonstrate the effectiveness and potential of our proposed approach.
翻译:语音深度伪造对个人安全和内容真实性构成重大威胁。文献中已提出多种检测器,这些系统面临的主要挑战之一是实现对未见数据的泛化,以识别跨广泛数据集的伪造信号。本文提出一种利用专家混合(Mixture of Experts)架构增强语音深度伪造检测性能的新方法。该框架因其擅长处理不同类型输入并高效应对数据变异性的特点,特别适用于语音深度伪造检测任务。相较于传统单一模型或集成方法,本方法在未见数据的泛化能力和适应性方面表现更优。此外,其模块化结构支持可扩展更新,使其在应对不断演进的深度伪造技术复杂性的同时保持高检测精度方面更具灵活性。我们提出一种高效轻量化的门控机制,为每个输入动态分配专家权重以优化检测性能。跨多个数据集的实验结果验证了所提方法的有效性和潜力。