The rapid emergence of image synthesis models poses challenges to the generalization of AI-generated image detectors. However, existing methods often rely on model-specific features, leading to overfitting and poor generalization. In this paper, we introduce the Multi-Cue Aggregation Network (MCAN), a novel framework that integrates different yet complementary cues in a unified network. MCAN employs a mixture-of-encoders adapter to dynamically process these cues, enabling more adaptive and robust feature representation. Our cues include the input image itself, which represents the overall content, and high-frequency components that emphasize edge details. Additionally, we introduce a Chromatic Inconsistency (CI) cue, which normalizes intensity values and captures noise information introduced during the image acquisition process in real images, making these noise patterns more distinguishable from those in AI-generated content. Unlike prior methods, MCAN's novelty lies in its unified multi-cue aggregation framework, which integrates spatial, frequency-domain, and chromaticity-based information for enhanced representation learning. These cues are intrinsically more indicative of real images, enhancing cross-model generalization. Extensive experiments on the GenImage, Chameleon, and UniversalFakeDetect benchmark validate the state-of-the-art performance of MCAN. In the GenImage dataset, MCAN outperforms the best state-of-the-art method by up to 7.4% in average ACC across eight different image generators.
翻译:图像合成模型的快速涌现对AI生成图像检测器的泛化能力提出了挑战。然而,现有方法通常依赖于模型特定的特征,导致过拟合和泛化能力不足。本文提出多线索聚合网络(MCAN),这是一种新颖的框架,将不同但互补的线索集成到统一网络中。MCAN采用混合编码器适配器动态处理这些线索,实现更具适应性和鲁棒性的特征表示。我们的线索包括代表整体内容的输入图像本身,以及强调边缘细节的高频分量。此外,我们引入了色度不一致性(CI)线索,该线索通过归一化强度值来捕获真实图像在采集过程中引入的噪声信息,使这些噪声模式与AI生成内容中的噪声更易区分。与先前方法不同,MCAN的创新之处在于其统一的多线索聚合框架,该框架集成了空间、频域和基于色度的信息以增强表示学习。这些线索本质上更能指示真实图像,从而提升了跨模型泛化能力。在GenImage、Chameleon和UniversalFakeDetect基准上进行的大量实验验证了MCAN的先进性能。在GenImage数据集中,MCAN在八种不同图像生成器上的平均ACC比现有最佳方法高出7.4%。