Detecting Multimedia Generated by Large AI Models: A Survey

The rapid advancement of Large AI Models (LAIMs), particularly diffusion models and large language models, has marked a new era where AI-generated multimedia is increasingly integrated into various aspects of daily life. Although beneficial in numerous fields, this content presents significant risks, including potential misuse, societal disruptions, and ethical concerns. Consequently, detecting multimedia generated by LAIMs has become crucial, with a marked rise in related research. Despite this, there remains a notable gap in systematic surveys that focus specifically on detecting LAIM-generated multimedia. Addressing this, we provide the first survey to comprehensively cover existing research on detecting multimedia (such as text, images, videos, audio, and multimodal content) created by LAIMs. Specifically, we introduce a novel taxonomy for detection methods, categorized by media modality, and aligned with two perspectives: pure detection (aiming to enhance detection performance) and beyond detection (adding attributes like generalizability, robustness, and interpretability to detectors). Additionally, we have presented a brief overview of generation mechanisms, public datasets, and online detection tools to provide a valuable resource for researchers and practitioners in this field. Furthermore, we identify current challenges in detection and propose directions for future research that address unexplored, ongoing, and emerging issues in detecting multimedia generated by LAIMs. Our aim for this survey is to fill an academic gap and contribute to global AI security efforts, helping to ensure the integrity of information in the digital realm. The project link is https://github.com/Purdue-M2/Detect-LAIM-generated-Multimedia-Survey.

翻译：大型AI模型（LAIMs）的快速发展，特别是扩散模型和大型语言模型，标志着一个新时代的到来，其中AI生成的多媒体内容正日益融入日常生活的各个方面。尽管这些内容在许多领域大有裨益，但也带来了重大风险，包括潜在的滥用、社会混乱和伦理问题。因此，检测由LAIMs生成的多媒体内容变得至关重要，相关研究也显著增加。然而，目前仍缺乏系统性的综述，专门聚焦于检测LAIMs生成的多媒体内容。为填补这一空白，我们首次全面综述了现有关于检测由LAIMs生成的多媒体内容（如文本、图像、视频、音频和多模态内容）的研究。具体而言，我们引入了一种新颖的检测方法分类法，按媒体模态进行划分，并对应两个视角：纯检测（旨在提升检测性能）和超越检测（为检测器添加泛化性、鲁棒性和可解释性等属性）。此外，我们简要介绍了生成机制、公共数据集和在线检测工具，为该领域的研究人员和从业者提供了宝贵资源。进一步地，我们识别了当前检测中的挑战，并提出了未来研究方向，以应对检测LAIMs生成的多媒体内容中未探索、持续和新出现的问题。本综述旨在填补学术空白，为全球AI安全努力做出贡献，助力确保数字领域信息的完整性。项目链接为https://github.com/Purdue-M2/Detect-LAIM-generated-Multimedia-Survey。