Image generation has advanced rapidly over the past decade, yet the literature seems fragmented across different models and application domains. This paper aims to offer a comprehensive survey of breakthrough image generation models, including variational autoencoders (VAEs), generative adversarial networks (GANs), normalizing flows, autoregressive and transformer-based generators, and diffusion-based methods. We provide a detailed technical walkthrough of each model type, including their underlying objectives, architectural building blocks, and algorithmic training steps. For each model type, we present the optimization techniques as well as common failure modes and limitations. We also go over recent developments in video generation and present the research works that made it possible to go from still frames to high quality videos. Lastly, we cover the growing importance of robustness and responsible deployment of these models, including deepfake risks, detection, artifacts, and watermarking.
翻译:过去十年间,图像生成技术取得了迅猛发展,但相关文献在不同模型和应用领域间呈现碎片化特征。本文旨在对突破性图像生成模型进行全面综述,涵盖变分自编码器(VAEs)、生成对抗网络(GANs)、归一化流、自回归与Transformer生成器、以及基于扩散的方法。我们针对每类模型提供详细的技术解析,包括其底层目标函数、架构构建模块及算法训练步骤。针对各类模型,本文阐释了优化技术及其常见失效模式与局限性。同时,我们梳理了视频生成领域的最新进展,并介绍了实现从静态帧到高质量视频跃迁的关键研究成果。最后,本文探讨了这些模型鲁棒性与负责任部署日益增长的重要性,包括深度伪造风险、检测方法、伪影问题及水印技术。