A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT

Recently, ChatGPT, along with DALL-E-2 and Codex,has been gaining significant attention from society. As a result, many individuals have become interested in related resources and are seeking to uncover the background and secrets behind its impressive performance. In fact, ChatGPT and other Generative AI (GAI) techniques belong to the category of Artificial Intelligence Generated Content (AIGC), which involves the creation of digital content, such as images, music, and natural language, through AI models. The goal of AIGC is to make the content creation process more efficient and accessible, allowing for the production of high-quality content at a faster pace. AIGC is achieved by extracting and understanding intent information from instructions provided by human, and generating the content according to its knowledge and the intent information. In recent years, large-scale models have become increasingly important in AIGC as they provide better intent extraction and thus, improved generation results. With the growth of data and the size of the models, the distribution that the model can learn becomes more comprehensive and closer to reality, leading to more realistic and high-quality content generation. This survey provides a comprehensive review on the history of generative models, and basic components, recent advances in AIGC from unimodal interaction and multimodal interaction. From the perspective of unimodality, we introduce the generation tasks and relative models of text and image. From the perspective of multimodality, we introduce the cross-application between the modalities mentioned above. Finally, we discuss the existing open problems and future challenges in AIGC.

翻译：近期，ChatGPT、DALL-E-2及Codex引起了社会各界的广泛关注，许多人因此对相关资源产生浓厚兴趣，并试图挖掘其卓越性能背后的背景与秘密。事实上，ChatGPT及其他生成式AI技术均属于人工智能生成内容（AIGC）范畴，即通过AI模型创建图像、音乐及自然语言等数字内容。AIGC的目标是使内容创建过程更高效、更易获取，从而以更快速度生产高质量内容。其实现方式为：提取并理解人类提供的指令中的意图信息，再基于自身知识与该意图信息生成内容。近年来，大规模模型在AIGC中日益重要，因其能提供更精准的意图提取，从而提升生成结果质量。随着数据量与模型规模的扩大，模型可学习的分布愈发全面且贴近现实，进而生成更真实、高质量的内容。本综述全面回顾了生成模型的发展史、基本组件，以及AIGC在单模态交互与多模态交互中的最新进展。从单模态视角出发，我们介绍了文本与图像的生成任务及对应模型；从多模态视角出发，则探讨了上述模态间的交叉应用。最后，我们讨论了当前AIGC领域存在的开放性问题与未来挑战。