Detecting Multimedia Generated by Large AI Models: A Survey

The rapid advancement of Large AI Models (LAIMs), particularly diffusion models and large language models, has marked a new era where AI-generated multimedia is increasingly integrated into various aspects of daily life. Although beneficial in numerous fields, this content presents significant risks, including potential misuse, societal disruptions, and ethical concerns. Consequently, detecting multimedia generated by LAIMs has become crucial, with a marked rise in related research. Despite this, there remains a notable gap in systematic surveys that focus specifically on detecting LAIM-generated multimedia. Addressing this, we provide the first survey to comprehensively cover existing research on detecting multimedia (such as text, images, videos, audio, and multimodal content) created by LAIMs. Specifically, we introduce a novel taxonomy for detection methods, categorized by media modality, and aligned with two perspectives: pure detection (aiming to enhance detection performance) and beyond detection (adding attributes like generalizability, robustness, and interpretability to detectors). Additionally, we have presented a brief overview of generation mechanisms, public datasets, and online detection tools to provide a valuable resource for researchers and practitioners in this field. Furthermore, we identify current challenges in detection and propose directions for future research that address unexplored, ongoing, and emerging issues in detecting multimedia generated by LAIMs. Our aim for this survey is to fill an academic gap and contribute to global AI security efforts, helping to ensure the integrity of information in the digital realm. The project link is https://github.com/Purdue-M2/Detect-LAIM-generated-Multimedia-Survey.

翻译：大型AI模型（LAIMs）的快速发展，特别是扩散模型和大语言模型，标志着人工智能生成多媒体内容日益融入日常生活的方方面面进入新时代。尽管这些内容在许多领域具有积极作用，但也带来重大风险，包括潜在滥用、社会动荡和伦理问题。因此，检测LAIMs生成的多媒体内容变得至关重要，相关研究显著增加。然而，针对LAIMs生成多媒体内容的系统性综述仍存在明显空白。为填补这一空白，我们首次全面综述了现有关于检测LAIMs生成的多媒体内容（如文本、图像、视频、音频和多模态内容）的研究。具体而言，我们提出了一种新颖的检测方法分类体系，按媒体模态划分，并从两个视角进行对齐：纯检测（旨在提升检测性能）与超越检测（为检测器赋予泛化性、鲁棒性和可解释性等属性）。此外，我们简要介绍了生成机制、公开数据集和在线检测工具，为该领域的研究人员和实践者提供宝贵资源。进一步地，我们识别了当前检测面临的挑战，并提出了未来研究方向，以应对检测LAIMs生成多媒体内容中尚未解决、持续存在及新兴的问题。本综述旨在填补学术空白，助力全球AI安全努力，确保数字领域的信息完整性。项目链接：https://github.com/Purdue-M2/Detect-LAIM-generated-Multimedia-Survey。