The generalization of deepfake detectors to unseen manipulation techniques remains a challenge for practical deployment. Although many approaches adapt foundation models by introducing significant architectural complexity, this work demonstrates that robust generalization is achievable through a parameter-efficient adaptation of one of the foundational pre-trained vision encoders. The proposed method, GenD, fine-tunes only the Layer Normalization parameters (0.03% of the total) and enhances generalization by enforcing a hyperspherical feature manifold using L2 normalization and metric learning on it. We conducted an extensive evaluation on 14 benchmark datasets spanning from 2019 to 2025. The proposed method achieves state-of-the-art performance, outperforming more complex, recent approaches in average cross-dataset AUROC. Our analysis yields two primary findings for the field: 1) training on paired real-fake data from the same source video is essential for mitigating shortcut learning and improving generalization, and 2) detection difficulty on academic datasets has not strictly increased over time, with models trained on older, diverse datasets showing strong generalization capabilities. This work delivers a computationally efficient and reproducible method, proving that state-of-the-art generalization is attainable by making targeted, minimal changes to a pre-trained foundational image encoder model. The code is at: https://github.com/yermandy/GenD
翻译:深度伪造检测器对未见操作技术的泛化能力仍是实际部署中的挑战。尽管许多方法通过引入显著架构复杂性来适配基础模型,但本研究证明,通过对预训练基础视觉编码器进行参数高效适配,即可实现稳健的泛化能力。本文提出的GenD方法仅微调层归一化参数(占总参数量的0.03%),并通过L2归一化与度量学习构建超球面特征流形来增强泛化效果。我们在2019至2025年间的14个基准数据集上进行了广泛评估,该方法取得最优性能,在跨数据集平均AUROC指标上超越近期更复杂的方法。研究获得两项关键发现:1)使用源自同源视频的配对真实-伪造数据进行训练,对抑制捷径学习并提升泛化能力至关重要;2)学术数据集的检测难度并未随时间严格递增,基于早期多样化数据集训练的模型仍表现出强大泛化能力。本工作提供了计算高效且可复现的方法,证明通过针对预训练基础图像编码器进行最小化精准修改,即可实现最优泛化性能。代码开源于:https://github.com/yermandy/GenD