The proliferation of deepfake technology poses significant challenges to the authenticity and trustworthiness of digital media, necessitating the development of robust detection methods. This study explores the application of Swin Transformers, a state-of-the-art architecture leveraging shifted windows for self-attention, in detecting and classifying deepfake images. Using the Real and Fake Face Detection dataset by Yonsei University's Computational Intelligence Photography Lab, we evaluate the Swin Transformer and hybrid models such as Swin-ResNet and Swin-KNN, focusing on their ability to identify subtle manipulation artifacts. Our results demonstrate that the Swin Transformer outperforms conventional CNN-based architectures, including VGG16, ResNet18, and AlexNet, achieving a test accuracy of 71.29\%. Additionally, we present insights into hybrid model design, highlighting the complementary strengths of transformer and CNN-based approaches in deepfake detection. This study underscores the potential of transformer-based architectures for improving accuracy and generalizability in image-based manipulation detection, paving the way for more effective countermeasures against deepfake threats.
翻译:深度伪造技术的泛滥对数字媒体的真实性与可信度构成了严峻挑战,亟需开发鲁棒的检测方法。本研究探讨了Swin Transformers(一种利用移位窗口实现自注意力机制的前沿架构)在深度伪造图像检测与分类中的应用。采用延世大学计算智能摄影实验室发布的真实与伪造人脸检测数据集,我们评估了Swin Transformer及Swin-ResNet、Swin-KNN等混合模型,重点关注其识别细微篡改伪影的能力。实验结果表明,Swin Transformer在测试集上达到71.29\%的准确率,优于包括VGG16、ResNet18和AlexNet在内的传统CNN架构。此外,我们通过混合模型设计揭示了Transformer与CNN方法在深度伪造检测中的互补优势。本研究论证了基于Transformer的架构在提升图像篡改检测精度与泛化能力方面的潜力,为构建更有效的深度伪造防御机制奠定了基础。