The rapid progress of generative models such as GANs and diffusion models has led to the widespread proliferation of AI-generated images, raising concerns about misinformation, privacy violations, and trust erosion in digital media. Although large-scale multimodal models like CLIP offer strong transferable representations for detecting synthetic content, fine-tuning them often induces catastrophic forgetting, which degrades pre-trained priors and limits cross-domain generalization. To address this issue, we propose the Distillation-guided Gradient Surgery Network (DGS-Net), a novel framework that preserves transferable pre-trained priors while suppressing task-irrelevant components. Specifically, we introduce a gradient-space decomposition that separates harmful and beneficial descent directions during optimization. By projecting task gradients onto the orthogonal complement of harmful directions and aligning with beneficial ones distilled from a frozen CLIP encoder, DGS-Net achieves unified optimization of prior preservation and irrelevant suppression. Extensive experiments on 50 generative models demonstrate that our method outperforms state-of-the-art approaches by an average margin of 6.6, achieving superior detection performance and generalization across diverse generation techniques.
翻译:生成对抗网络(GANs)和扩散模型等生成模型的快速发展导致了AI生成图像的广泛传播,引发了人们对数字媒体中错误信息、隐私侵犯和信任侵蚀的担忧。尽管像CLIP这样的大规模多模态模型为检测合成内容提供了强大的可迁移表征,但对它们进行微调常常会引发灾难性遗忘,这会破坏预训练先验知识并限制跨领域泛化能力。为解决这一问题,我们提出了蒸馏引导梯度手术网络(DGS-Net),这是一个新颖的框架,能够在抑制任务无关组件的同时保留可迁移的预训练先验知识。具体而言,我们引入了一种梯度空间分解方法,在优化过程中分离有害和有益的下降方向。通过将任务梯度投影到有害方向的正交补空间上,并与从冻结的CLIP编码器中蒸馏出的有益方向对齐,DGS-Net实现了先验知识保留与无关信息抑制的统一优化。在50个生成模型上进行的大量实验表明,我们的方法以平均6.6个百分点的优势超越了现有最先进方法,在多种生成技术上实现了卓越的检测性能和泛化能力。