Image generation has advanced rapidly over the past decade, yet the literature seems fragmented across different models and application domains. This paper aims to offer a comprehensive survey of breakthrough image generation models, including variational autoencoders (VAEs), generative adversarial networks (GANs), normalizing flows, autoregressive and transformer-based generators, and diffusion-based methods. We provide a detailed technical walkthrough of each model type, including their underlying objectives, architectural building blocks, and algorithmic training steps. For each model type, we present the optimization techniques as well as common failure modes and limitations. We also go over recent developments in video generation and present the research works that made it possible to go from still frames to high quality videos. Lastly, we cover the growing importance of robustness and responsible deployment of these models, including deepfake risks, detection, artifacts, and watermarking.
翻译:图像生成在过去十年中发展迅速,然而相关文献似乎分散在不同的模型和应用领域中。本文旨在对突破性的图像生成模型进行全面综述,包括变分自编码器(VAEs)、生成对抗网络(GANs)、归一化流、自回归及基于Transformer的生成器,以及基于扩散的方法。我们对每种模型类型提供了详细的技术梳理,包括其基本目标、架构构建模块和算法训练步骤。针对每种模型类型,我们介绍了优化技术以及常见的失效模式和局限性。我们还回顾了视频生成的最新进展,并介绍了实现从静态帧到高质量视频转换的研究工作。最后,我们探讨了这些模型的鲁棒性和负责任部署日益增长的重要性,包括深度伪造风险、检测技术、伪影问题以及水印技术。