This survey reviews text-to-image diffusion models in the context that diffusion models have emerged to be popular for a wide range of generative tasks. As a self-contained work, this survey starts with a brief introduction of how a basic diffusion model works for image synthesis, followed by how condition or guidance improves learning. Based on that, we present a review of state-of-the-art methods on text-conditioned image synthesis, i.e., text-to-image. We further summarize applications beyond text-to-image generation: text-guided creative generation and text-guided image editing. Beyond the progress made so far, we discuss existing challenges and promising future directions.
翻译:本综述在扩散模型已广泛流行于各类生成任务的背景下,对文本到图像扩散模型进行回顾。作为一项自包含的研究工作,本文首先简要介绍基础扩散模型如何实现图像生成,进而阐述条件或引导机制如何改进学习过程。在此基础上,我们对基于文本条件的最先进图像生成方法(即文本到图像)进行综述,并进一步总结超越文本到图像生成的应用:文本引导的创意生成与文本引导的图像编辑。除已有进展外,本文还探讨了现有挑战及具有前景的未来方向。