Recent advances in deep generative models have made it easier to manipulate face videos, raising significant concerns about their potential misuse for fraud and misinformation. Existing detectors often perform well in in-domain scenarios but fail to generalize across diverse manipulation techniques due to their reliance on forgery-specific artifacts. In this work, we introduce DeepShield, a novel deepfake detection framework that balances local sensitivity and global generalization to improve robustness across unseen forgeries. DeepShield enhances the CLIP-ViT encoder through two key components: Local Patch Guidance (LPG) and Global Forgery Diversification (GFD). LPG applies spatiotemporal artifact modeling and patch-wise supervision to capture fine-grained inconsistencies often overlooked by global models. GFD introduces domain feature augmentation, leveraging domain-bridging and boundary-expanding feature generation to synthesize diverse forgeries, mitigating overfitting and enhancing cross-domain adaptability. Through the integration of novel local and global analysis for deepfake detection, DeepShield outperforms state-of-the-art methods in cross-dataset and cross-manipulation evaluations, achieving superior robustness against unseen deepfake attacks.
翻译:深度生成模型的最新进展使得人脸视频的操纵变得更加容易,这引发了对其可能被滥用于欺诈和虚假信息的严重担忧。现有检测器通常在域内场景中表现良好,但由于其依赖于特定伪造痕迹,难以在不同操纵技术之间实现泛化。本文提出DeepShield,一种新颖的深度伪造检测框架,通过平衡局部敏感性与全局泛化能力,提升对未知伪造方法的鲁棒性。DeepShield通过两个关键组件增强CLIP-ViT编码器:局部块引导(Local Patch Guidance, LPG)与全局伪造多样化(Global Forgery Diversification, GFD)。LPG采用时空伪影建模和逐块监督,以捕捉常被全局模型忽略的细粒度不一致性。GFD引入域特征增强,利用域桥接和边界扩展特征生成技术合成多样化伪造样本,从而减轻过拟合并增强跨域适应能力。通过整合新颖的局部与全局分析机制,DeepShield在跨数据集和跨操纵方法的评估中超越了现有最优方法,对未知深度伪造攻击展现出卓越的鲁棒性。