Recent deepfake detection methods have increasingly explored frequency domain representations to reveal manipulation artifacts that are difficult to detect in the spatial domain. However, most existing approaches rely primarily on spectral magnitude, implicitly under exploring the role of phase information. In this work, we propose Phase4DFD, a phase aware frequency domain deepfake detection framework that explicitly models phase magnitude interactions via a learnable attention mechanism. Our approach augments standard RGB input with Fast Fourier Transform (FFT) magnitude and local binary pattern (LBP) representations to expose subtle synthesis artifacts that remain indistinguishable under spatial analysis alone. Crucially, we introduce an input level phase aware attention module that uses phase discontinuities commonly introduced by synthetic generation to guide the model toward frequency patterns that are most indicative of manipulation before backbone feature extraction. The attended multi domain representation is processed by an efficient BNext M backbone, with optional channel spatial attention applied for semantic feature refinement. Extensive experiments on the CIFAKE and DFFD datasets demonstrate that our proposed model Phase4DFD outperforms state of the art spatial and frequency-based detectors while maintaining low computational overhead. Comprehensive ablation studies further confirm that explicit phase modeling provides complementary and non-redundant information beyond magnitude-only frequency representations.
翻译:近期的深度伪造检测方法越来越多地探索频域表示,以揭示在空间域中难以检测的篡改痕迹。然而,现有方法大多主要依赖频谱幅度,对相位信息的作用探索不足。本文提出Phase4DFD,一种相位感知的频域深度伪造检测框架,通过可学习的注意力机制显式建模相位与幅度的交互关系。该方法在标准RGB输入的基础上,融合快速傅里叶变换(FFT)幅度谱与局部二值模式(LBP)表示,以暴露仅靠空间分析难以区分的细微合成伪影。关键创新在于引入输入级的相位感知注意力模块,利用合成生成过程中常见的相位不连续性,在骨干网络进行特征提取前,引导模型关注最能表征篡改行为的频域模式。经过注意力加权的多域表示由高效的BNext-M骨干网络处理,并可选择性应用通道-空间注意力进行语义特征细化。在CIFAKE与DFFD数据集上的大量实验表明,所提出的Phase4DFD模型在保持较低计算开销的同时,性能优于当前最先进的空间域与频域检测器。系统的消融研究进一步证实,显式的相位建模能够提供超越纯幅度频域表示的互补性非冗余信息。