Recently, there has been significant progress in the development of large models. Following the success of ChatGPT, numerous language models have been introduced, demonstrating remarkable performance. Similar advancements have also been observed in image generation models, such as Google's Imagen model, OpenAI's DALL-E 2, and stable diffusion models, which have exhibited impressive capabilities in generating images. However, similar to large language models, these models still encounter unresolved challenges. Fortunately, the availability of open-source stable diffusion models and their underlying mathematical principles has enabled the academic community to extensively analyze the performance of current image generation models and make improvements based on this stable diffusion framework. This survey aims to examine the existing issues and the current solutions pertaining to image generation models.
翻译:近期,大模型的发展取得了显著进展。继ChatGPT成功之后,众多语言模型相继问世,展现出了卓越的性能。类似的进步在图像生成模型中也已显现,例如谷歌的Imagen模型、OpenAI的DALL-E 2以及稳定扩散模型,它们在图像生成方面展现了令人印象深刻的能力。然而,与大语言模型类似,这些模型仍然面临着未解决的挑战。幸运的是,开源稳定扩散模型及其底层数学原理的可用性,使得学术界能够广泛分析当前图像生成模型的性能,并基于此稳定扩散框架进行改进。本综述旨在审视图像生成模型目前存在的问题及当前相应的解决方案。