With recent text-to-image models, anyone can generate deceptively realistic images with arbitrary contents, fueling the growing threat of visual disinformation. A key enabler for generating high-resolution images with low computational cost has been the development of latent diffusion models (LDMs). In contrast to conventional diffusion models, LDMs perform the denoising process in the low-dimensional latent space of a pre-trained autoencoder (AE) instead of the high-dimensional image space. Despite their relevance, the forensic analysis of LDMs is still in its infancy. In this work we propose AEROBLADE, a novel detection method which exploits an inherent component of LDMs: the AE used to transform images between image and latent space. We find that generated images can be more accurately reconstructed by the AE than real images, allowing for a simple detection approach based on the reconstruction error. Most importantly, our method is easy to implement and does not require any training, yet nearly matches the performance of detectors that rely on extensive training. We empirically demonstrate that AEROBLADE is effective against state-of-the-art LDMs including Stable Diffusion and Midjourney. Beyond detection, our approach allows for the qualitative analysis of images, which can be leveraged for identifying inpainted regions.
翻译:随着近期文本到图像模型的发展,任何人都能生成以假乱真的任意内容图像,这加剧了视觉虚假信息日益增长的威胁。实现低计算成本生成高分辨率图像的关键在于潜在扩散模型(LDM)的突破。与传统扩散模型不同,LDM在预训练自编码器(AE)的低维潜在空间而非高维图像空间中执行去噪过程。尽管此类模型影响重大,但针对LDM的法医分析仍处于起步阶段。本文提出AEROBLADE这一新型检测方法,该方法利用LDM的固有组件——用于在图像空间与潜在空间之间转换图像的AE。我们发现生成图像能被AE更精确地重建,据此可构建基于重建误差的简易检测方法。最重要的是,本方法易于实现且无需任何训练,却几乎能达到依赖大量训练数据的检测器的性能水平。实验证明,AEROBLADE对包括Stable Diffusion和Midjourney在内的先进LDM均有效。除检测功能外,本方法还可用于图像定性分析,尤其适用于识别图像中的修复区域。