Recent advances in deep generative models have made it easier to manipulate face videos, raising significant concerns about their potential misuse for fraud and misinformation. Existing detectors often perform well in in-domain scenarios but fail to generalize across diverse manipulation techniques due to their reliance on forgery-specific artifacts. In this work, we introduce DeepShield, a novel deepfake detection framework that balances local sensitivity and global generalization to improve robustness across unseen forgeries. DeepShield enhances the CLIP-ViT encoder through two key components: Local Patch Guidance (LPG) and Global Forgery Diversification (GFD). LPG applies spatiotemporal artifact modeling and patch-wise supervision to capture fine-grained inconsistencies often overlooked by global models. GFD introduces domain feature augmentation, leveraging domain-bridging and boundary-expanding feature generation to synthesize diverse forgeries, mitigating overfitting and enhancing cross-domain adaptability. Through the integration of novel local and global analysis for deepfake detection, DeepShield outperforms state-of-the-art methods in cross-dataset and cross-manipulation evaluations, achieving superior robustness against unseen deepfake attacks. Code is available at https://github.com/lijichang/DeepShield.
翻译:深度生成模型的最新进展使得人脸视频的操纵变得更加容易,这引发了对其可能被滥用于欺诈和错误信息的严重担忧。现有检测器通常在域内场景下表现良好,但由于其依赖于特定伪造痕迹,难以泛化到多样化的操纵技术。本文提出DeepShield,一种新颖的深度伪造检测框架,通过平衡局部敏感性与全局泛化能力,提升对未见伪造的鲁棒性。DeepShield通过两个关键组件增强CLIP-ViT编码器:局部块引导与全局伪造多样化。局部块引导应用时空伪影建模和逐块监督,以捕获常被全局模型忽略的细粒度不一致性。全局伪造多样化引入域特征增强,利用域桥接和边界扩展的特征生成来合成多样化伪造,从而减轻过拟合并增强跨域适应性。通过整合新颖的局部与全局分析进行深度伪造检测,DeepShield在跨数据集和跨操纵评估中超越了现有最先进方法,对未见深度伪造攻击展现出卓越的鲁棒性。代码发布于 https://github.com/lijichang/DeepShield。