In recent years, computer vision has witnessed remarkable progress, fueled by the development of innovative architectures such as Convolutional Neural Networks (CNNs), Generative Adversarial Networks (GANs), diffusion-based architectures, Vision Transformers (ViTs), and, more recently, Vision-Language Models (VLMs). This progress has undeniably contributed to creating increasingly realistic and diverse visual content. However, such advancements in image generation also raise concerns about potential misuse in areas such as misinformation, identity theft, and threats to privacy and security. In parallel, Mamba-based architectures have emerged as versatile tools for a range of image analysis tasks, including classification, segmentation, medical imaging, object detection, and image restoration, in this rapidly evolving field. However, their potential for identifying AI-generated images remains relatively unexplored compared to established techniques. This study provides a systematic evaluation and comparative analysis of Vision Mamba models for AI-generated image detection. We benchmark multiple Vision Mamba variants against representative CNNs, ViTs, and VLM-based detectors across diverse datasets and synthetic image sources, focusing on key metrics such as accuracy, efficiency, and generalizability across diverse image types and generative models. Through this comprehensive analysis, we aim to elucidate Vision Mamba's strengths and limitations relative to established methodologies in terms of applicability, accuracy, and efficiency in detecting AI-generated images. Overall, our findings highlight both the promise and current limitations of Vision Mamba as a component in systems designed to distinguish authentic from AI-generated visual content. This research is crucial for enhancing detection in an age where distinguishing between real and AI-generated content is a major challenge.
翻译:近年来,随着卷积神经网络(CNNs)、生成对抗网络(GANs)、扩散架构、视觉Transformer(ViTs)以及最新的视觉-语言模型(VLMs)等创新架构的发展,计算机视觉领域取得了显著进展。这些进步无疑促进了日益逼真且多样化的视觉内容的生成。然而,图像生成技术的此类进步也引发了关于其在虚假信息、身份盗窃以及隐私安全威胁等领域潜在滥用的担忧。与此同时,在这个快速发展的领域中,基于Mamba的架构已成为一系列图像分析任务(包括分类、分割、医学成像、目标检测和图像修复)的通用工具。然而,与成熟技术相比,它们在识别AI生成图像方面的潜力仍未得到充分探索。本研究对用于AI生成图像检测的视觉Mamba模型进行了系统评估和比较分析。我们针对多个视觉Mamba变体,与代表性的CNN、ViT和基于VLM的检测器进行了基准测试,涵盖了不同的数据集和合成图像来源,重点关注准确性、效率以及跨不同图像类型和生成模型的泛化能力等关键指标。通过这一全面分析,我们旨在阐明视觉Mamba在适用性、准确性和效率方面,相较于既有的AI生成图像检测方法的优势与局限。总体而言,我们的结果既突显了视觉Mamba作为区分真实与AI生成视觉内容的系统组件的潜力,也指出了其当前的局限性。在区分真实与AI生成内容成为重大挑战的时代,这项研究对于提升检测能力至关重要。