A Multi-Domain Feature Fusion Framework for Generalizable Deepfake Detection Across Different Generators

Deepfakes are artificially generated images, audio, or videos that threaten privacy, security, and information integrity. Detecting such content is crucial for countering disinformation, as the latest models generate highly realistic content. While spatial- or frequency-based approaches achieve good detection rates on Generative Adversarial Networks (GANs)-based generated deepfakes, they often struggle with recent diffusion model-generated images. In particular, existing approaches rarely exploit complementary multi-domain representations or systematically evaluate cross-generator robustness. To address these challenges, we propose a multi-domain deepfake detection framework called SGFF-Net (Spatial-Gradient-Frequency Fusion Network) that integrates spatial, gradient, and DWT (Discrete Wavelet Transform)-based frequency representations within a dual residual learning architecture. Experimental results show that the SGFF-Net achieves 98.95\% accuracy in intra-dataset evaluation and improves performance in both cross-model (70.46\%) and cross-paradigm (69.94\%) settings. Incorporating multi-source training and data augmentation further enhances robustness, increasing accuracy from 70.46\% to 79.80\% in cross-model evaluation, from 69\% to 78\% in cross-paradigm evaluation, and from 61.50\% to 75.80\% on real-world data. Unlike single-domain detectors, the SGFF-Net learns complementary forensic cues across spatial, gradient, and wavelet-frequency domains, resulting in greater robustness under cross-generator and cross-paradigm evaluation. The results further show that combining multi-domain representations with data diversity and augmentation substantially improves generalization, providing practical insights for developing more reliable deepfake detection systems.

翻译：深度伪造是通过人工生成的图像、音频或视频，威胁隐私、安全和信息完整性。检测此类内容对于反制虚假信息至关重要，因最新模型能生成高度逼真的内容。尽管基于空间域或频域的方法在检测基于生成对抗网络（GANs）生成的深度伪造方面取得了良好效果，但面对近期扩散模型生成的图像时往往表现不佳。现有方法尤其缺乏对互补多域表示的充分利用或对跨生成器鲁棒性的系统性评估。为解决这些挑战，我们提出了一种名为SGFF-Net（空间-梯度-频率融合网络）的多域深度伪造检测框架，该框架在双残差学习架构中集成了空间、梯度及基于离散小波变换（DWT）的频域表示。实验结果表明，SGFF-Net在数据集内评估中达到98.95%的准确率，并在跨模型（70.46%）和跨范式（69.94%）场景下均提升了性能。结合多源训练和数据增强进一步增强了鲁棒性，使跨模型评估准确率从70.46%提升至79.80%，跨范式评估从69%提升至78%，真实世界数据从61.50%提升至75.80%。与单域检测器不同，SGFF-Net通过学习空间、梯度和小波频域的互补取证线索，在跨生成器和跨范式评估中展现出更强的鲁棒性。结果进一步表明，将多域表示与数据多样性及增强相结合能显著提升泛化能力，为开发更可靠的深度伪造检测系统提供了实践见解。