Can Visual Mamba Improve AI-Generated Image Detection? An In-Depth Investigation

In recent years, computer vision has witnessed remarkable progress, fueled by the development of innovative architectures such as Convolutional Neural Networks (CNNs), Generative Adversarial Networks (GANs), diffusion-based architectures, Vision Transformers (ViTs), and, more recently, Vision-Language Models (VLMs). This progress has undeniably contributed to creating increasingly realistic and diverse visual content. However, such advancements in image generation also raise concerns about potential misuse in areas such as misinformation, identity theft, and threats to privacy and security. In parallel, Mamba-based architectures have emerged as versatile tools for a range of image analysis tasks, including classification, segmentation, medical imaging, object detection, and image restoration, in this rapidly evolving field. However, their potential for identifying AI-generated images remains relatively unexplored compared to established techniques. This study provides a systematic evaluation and comparative analysis of Vision Mamba models for AI-generated image detection. We benchmark multiple Vision Mamba variants against representative CNNs, ViTs, and VLM-based detectors across diverse datasets and synthetic image sources, focusing on key metrics such as accuracy, efficiency, and generalizability across diverse image types and generative models. Through this comprehensive analysis, we aim to elucidate Vision Mamba's strengths and limitations relative to established methodologies in terms of applicability, accuracy, and efficiency in detecting AI-generated images. Overall, our findings highlight both the promise and current limitations of Vision Mamba as a component in systems designed to distinguish authentic from AI-generated visual content. This research is crucial for enhancing detection in an age where distinguishing between real and AI-generated content is a major challenge.

翻译：近年来，随着卷积神经网络（CNNs）、生成对抗网络（GANs）、扩散架构、视觉Transformer（ViTs）以及最新的视觉-语言模型（VLMs）等创新架构的发展，计算机视觉领域取得了显著进展。这些进步无疑促进了日益逼真且多样化的视觉内容的生成。然而，图像生成技术的此类进步也引发了关于其在虚假信息、身份盗窃以及隐私安全威胁等领域潜在滥用的担忧。与此同时，在这个快速发展的领域中，基于Mamba的架构已成为一系列图像分析任务（包括分类、分割、医学成像、目标检测和图像修复）的通用工具。然而，与成熟技术相比，它们在识别AI生成图像方面的潜力仍未得到充分探索。本研究对用于AI生成图像检测的视觉Mamba模型进行了系统评估和比较分析。我们针对多个视觉Mamba变体，与代表性的CNN、ViT和基于VLM的检测器进行了基准测试，涵盖了不同的数据集和合成图像来源，重点关注准确性、效率以及跨不同图像类型和生成模型的泛化能力等关键指标。通过这一全面分析，我们旨在阐明视觉Mamba在适用性、准确性和效率方面，相较于既有的AI生成图像检测方法的优势与局限。总体而言，我们的结果既突显了视觉Mamba作为区分真实与AI生成视觉内容的系统组件的潜力，也指出了其当前的局限性。在区分真实与AI生成内容成为重大挑战的时代，这项研究对于提升检测能力至关重要。

相关内容

关注 7111

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

【NTU博士论文】视频生成新突破：从人脸说话视频到通用视频制作

专知会员服务

16+阅读 · 1月15日

面向计算机视觉的数据生成与应用研究进展

专知会员服务

14+阅读 · 2025年5月10日

遥感中的视觉Mamba：技术、应用与前景的综合综述

专知会员服务

13+阅读 · 2025年5月2日

【CVPR2025】超图视觉Transformer：图像不仅仅是节点，也不仅仅是边

专知会员服务

13+阅读 · 2025年4月14日