The proliferation of AI-generated imagery and sophisticated editing tools has rendered traditional forensic methods ineffective for cross-domain forgery detection. We present ForensicFormer, a hierarchical multi-scale framework that unifies low-level artifact detection, mid-level boundary analysis, and high-level semantic reasoning via cross-attention transformers. Unlike prior single-paradigm approaches, which achieve <75% accuracy on out-of-distribution datasets, our method maintains 86.8% average accuracy across seven diverse test sets, spanning traditional manipulations, GAN-generated images, and diffusion model outputs - a significant improvement over state-of-the-art universal detectors. We demonstrate superior robustness to JPEG compression (83% accuracy at Q=70 vs. 66% for baselines) and provide pixel-level forgery localization with a 0.76 F1-score. Extensive ablation studies validate that each hierarchical component contributes 4-10% accuracy improvement, and qualitative analysis reveals interpretable forensic features aligned with human expert reasoning. Our work bridges classical image forensics and modern deep learning, offering a practical solution for real-world deployment where manipulation techniques are unknown a priori.
翻译:随着AI生成图像与高级编辑工具的普及,传统取证方法在跨域伪造检测中已显乏力。本文提出ForensicFormer,一种层次化多尺度框架,通过跨注意力Transformer统一了低层伪影检测、中层边界分析与高层语义推理。与先前单一范式方法(在分布外数据集上准确率低于75%)不同,本方法在涵盖传统篡改、GAN生成图像及扩散模型输出的七种多样化测试集上保持86.8%的平均准确率,较现有通用检测器有显著提升。实验表明该方法对JPEG压缩具有优异鲁棒性(Q=70时准确率达83%,基线方法为66%),并以0.76的F1分数实现像素级伪造定位。大量消融研究证实每个层次组件带来4-10%的准确率提升,定性分析揭示了与人类专家推理逻辑一致的可解释取证特征。本工作架起了经典图像取证与现代深度学习的桥梁,为现实场景中篡改技术先验未知的实际部署提供了可行方案。