Detecting Multimedia Generated by Large AI Models: A Survey

The rapid advancement of Large AI Models (LAIMs), particularly diffusion models and large language models, has marked a new era where AI-generated multimedia is increasingly integrated into various aspects of daily life. Although beneficial in numerous fields, this content presents significant risks, including potential misuse, societal disruptions, and ethical concerns. Consequently, detecting multimedia generated by LAIMs has become crucial, with a marked rise in related research. Despite this, there remains a notable gap in systematic surveys that focus specifically on detecting LAIM-generated multimedia. Addressing this, we provide the first survey to comprehensively cover existing research on detecting multimedia (such as text, images, videos, audio, and multimodal content) created by LAIMs. Specifically, we introduce a novel taxonomy for detection methods, categorized by media modality, and aligned with two perspectives: pure detection (aiming to enhance detection performance) and beyond detection (adding attributes like generalizability, robustness, and interpretability to detectors). Additionally, we have presented a brief overview of generation mechanisms, public datasets, and online detection tools to provide a valuable resource for researchers and practitioners in this field. Furthermore, we identify current challenges in detection and propose directions for future research that address unexplored, ongoing, and emerging issues in detecting multimedia generated by LAIMs. Our aim for this survey is to fill an academic gap and contribute to global AI security efforts, helping to ensure the integrity of information in the digital realm. The project link is https://github.com/Purdue-M2/Detect-LAIM-generated-Multimedia-Survey.

翻译：大型AI模型（LAIMs），特别是扩散模型和大语言模型的快速发展，标志着AI生成多媒体内容日益融入日常生活方方面面的新纪元。尽管这些内容在许多领域有所裨益，但也带来了重大风险，包括潜在滥用、社会混乱及伦理问题。因此，检测LAIMs生成的多媒体内容变得至关重要，相关研究显著增多。然而，目前仍缺乏专门针对检测LAIMs生成多媒体内容的系统性综述。针对这一空白，我们提供了首个全面覆盖检测由LAIMs生成的多媒体（如文本、图像、视频、音频及多模态内容）现有研究的综述。具体而言，我们引入了一种新颖的检测方法分类体系，按媒体模态划分，并从两个视角加以对齐：纯检测（旨在提升检测性能）与超越检测（为检测器添加泛化性、鲁棒性和可解释性等属性）。此外，我们简要概述了生成机制、公共数据集及在线检测工具，为该领域的研究人员和实践者提供了宝贵资源。我们还指出了当前检测面临的挑战，并提出了未来研究方向，以应对检测LAIMs生成多媒体内容中未探索、持续存在及新兴的问题。本综述旨在填补学术空白，为全球人工智能安全努力做出贡献，帮助确保数字领域信息的完整性。项目链接为：https://github.com/Purdue-M2/Detect-LAIM-generated-Multimedia-Survey。