Beyond Known Fakes: Generalized Detection of AI-Generated Images via Post-hoc Distribution Alignment

The rapid proliferation of highly realistic AI-generated images poses serious security threats such as misinformation and identity fraud. Detecting generated images in open-world settings is particularly challenging when they originate from unknown generators, as existing methods typically rely on model-specific artifacts and require retraining on new fake data, limiting their generalization and scalability. In this work, we propose Post-hoc Distribution Alignment (PDA), a generalized and model-agnostic framework for detecting AI-generated images under unknown generative threats. Specifically, PDA reformulates detection as a distribution alignment task by regenerating test images through a known generative model. When real images are regenerated, they inherit model-specific artifacts and align with the known fake distribution. In contrast, regenerated unknown fakes contain incompatible or mixed artifacts and remain misaligned. This difference allows an existing detector, trained on the known generative model, to accurately distinguish real images from unknown fakes without requiring access to unseen data or retraining. Extensive experiments across 16 state-of-the-art generative models, including GANs, diffusion models, and commercial text-to-image APIs (e.g., Midjourney), demonstrate that PDA achieves average detection accuracy of 96.69%, outperforming the best baseline by 10.71%. Comprehensive ablation studies and robustness analyses further confirm PDA's generalizability and resilience to distribution shifts and image transformations. Overall, our work provides a practical and scalable solution for real-world AI-generated image detection where new generative models emerge continuously.

翻译：高度逼真的AI生成图像的快速扩散带来了严重的安全威胁，如虚假信息和身份欺诈。在开放世界环境中，当生成图像源自未知生成器时，检测工作尤为困难，因为现有方法通常依赖于模型特定的伪影，且需要在新伪造数据上重新训练，限制了其泛化能力和可扩展性。本文提出事后分布对齐，这是一种针对未知生成威胁下AI生成图像检测的泛化且模型无关的框架。具体而言，PDA通过已知生成模型对测试图像进行再生成，将检测任务重新定义为分布对齐问题。当真实图像被再生成时，它们会继承模型特定的伪影并与已知伪造分布对齐。相反，再生成的未知伪造图像则包含不兼容或混合的伪影，并保持未对齐状态。这种差异使得在已知生成模型上训练的现有检测器能够准确区分真实图像与未知伪造图像，而无需访问未见数据或重新训练。在包括GAN、扩散模型和商业文本到图像API在内的16个最先进生成模型上进行的大量实验表明，PDA实现了96.69%的平均检测准确率，优于最佳基线方法10.71%。全面的消融研究和鲁棒性分析进一步证实了PDA对分布偏移和图像变换的泛化能力和鲁棒性。总体而言，我们的工作为生成模型不断涌现的现实世界AI生成图像检测提供了一个实用且可扩展的解决方案。