CIFAKE: Image Classification and Explainable Identification of AI-Generated Synthetic Images

Recent technological advances in synthetic data have enabled the generation of images with such high quality that human beings cannot tell the difference between real-life photographs and Artificial Intelligence (AI) generated images. Given the critical necessity of data reliability and authentication, this article proposes to enhance our ability to recognise AI-generated images through computer vision. Initially, a synthetic dataset is generated that mirrors the ten classes of the already available CIFAR-10 dataset with latent diffusion which provides a contrasting set of images for comparison to real photographs. The model is capable of generating complex visual attributes, such as photorealistic reflections in water. The two sets of data present as a binary classification problem with regard to whether the photograph is real or generated by AI. This study then proposes the use of a Convolutional Neural Network (CNN) to classify the images into two categories; Real or Fake. Following hyperparameter tuning and the training of 36 individual network topologies, the optimal approach could correctly classify the images with 92.98% accuracy. Finally, this study implements explainable AI via Gradient Class Activation Mapping to explore which features within the images are useful for classification. Interpretation reveals interesting concepts within the image, in particular, noting that the actual entity itself does not hold useful information for classification; instead, the model focuses on small visual imperfections in the background of the images. The complete dataset engineered for this study, referred to as the CIFAKE dataset, is made publicly available to the research community for future work.

翻译：近期合成数据技术的进步使得生成的图像质量极高，人类已无法区分真实照片与人工智能（AI）生成图像之间的差异。鉴于数据可靠性与验证的迫切需求，本文提出通过计算机视觉增强识别AI生成图像的能力。首先，利用潜在扩散方法生成一个与现有CIFAR-10数据集十类标签相对应的合成数据集，提供一组用于与真实照片对比的图像。该模型能够生成复杂视觉特征，例如水面上的逼真反射。两组数据构成一个二元分类问题，即判断图像是真实拍摄还是由AI生成。本研究进而提出使用卷积神经网络（CNN）将图像分为两类：真实或伪造。经过超参数调优和36种不同网络拓扑结构的训练，最优方法实现了92.98%的分类准确率。最后，本研究通过梯度类激活映射实现可解释AI，以探索图像中对分类有用的特征。解释揭示了图像中的有趣概念，特别指出实际实体本身并不包含对分类有用的信息，模型关注的焦点反而是图像背景中的微小视觉瑕疵。本研究构建的完整数据集——称为CIFAKE数据集——已公开发布给研究社区，以供未来研究使用。