Handcrafted Feature Fusion for Reliable Detection of AI-Generated Images

The rapid progress of generative models has enabled the creation of highly realistic synthetic images, raising concerns about authenticity and trust in digital media. Detecting such fake content reliably is an urgent challenge. While deep learning approaches dominate current literature, handcrafted features remain attractive for their interpretability, efficiency, and generalizability. In this paper, we conduct a systematic evaluation of handcrafted descriptors, including raw pixels, color histograms, Discrete Cosine Transform (DCT), Histogram of Oriented Gradients (HOG), Local Binary Patterns (LBP), Gray-Level Co-occurrence Matrix (GLCM), and wavelet features, on the CIFAKE dataset of real versus synthetic images. Using 50,000 training and 10,000 test samples, we benchmark seven classifiers ranging from Logistic Regression to advanced gradient-boosted ensembles (LightGBM, XGBoost, CatBoost). Results demonstrate that LightGBM consistently outperforms alternatives, achieving PR-AUC 0.9879, ROC-AUC 0.9878, F1 0.9447, and a Brier score of 0.0414 with mixed features, representing strong gains in calibration and discrimination over simpler descriptors. Across three configurations (baseline, advanced, mixed), performance improves monotonically, confirming that combining diverse handcrafted features yields substantial benefit. These findings highlight the continued relevance of carefully engineered features and ensemble learning for detecting synthetic images, particularly in contexts where interpretability and computational efficiency are critical.

翻译：生成模型的快速发展使得能够创建高度逼真的合成图像，引发了人们对数字媒体真实性和可信度的担忧。可靠地检测此类虚假内容是一项紧迫的挑战。尽管深度学习方法主导了当前文献，但手工特征因其可解释性、高效性和泛化能力而仍然具有吸引力。本文在CIFAKE真实与合成图像数据集上，对包括原始像素、颜色直方图、离散余弦变换（DCT）、方向梯度直方图（HOG）、局部二值模式（LBP）、灰度共生矩阵（GLCM）和小波特征在内的手工描述符进行了系统评估。使用50,000个训练样本和10,000个测试样本，我们对从逻辑回归到先进梯度提升集成模型（LightGBM、XGBoost、CatBoost）的七种分类器进行了基准测试。结果表明，LightGBM在混合特征配置下持续优于其他方法，取得了PR-AUC 0.9879、ROC-AUC 0.9878、F1分数0.9447和Brier分数0.0414的性能，相较于简单描述符在校准和判别能力上实现了显著提升。在三种配置（基线、高级、混合）中，性能呈现单调提升，证实了融合多样化手工特征能带来实质性收益。这些发现凸显了精心设计的特征和集成学习在检测合成图像方面的持续相关性，特别是在可解释性和计算效率至关重要的应用场景中。