Text-to-image diffusion models have demonstrated remarkable capabilities in transforming textual prompts into coherent images, yet the computational cost of their inference remains a persistent challenge. To address this issue, we present UFOGen, a novel generative model designed for ultra-fast, one-step text-to-image synthesis. In contrast to conventional approaches that focus on improving samplers or employing distillation techniques for diffusion models, UFOGen adopts a hybrid methodology, integrating diffusion models with a GAN objective. Leveraging a newly introduced diffusion-GAN objective and initialization with pre-trained diffusion models, UFOGen excels in efficiently generating high-quality images conditioned on textual descriptions in a single step. Beyond traditional text-to-image generation, UFOGen showcases versatility in applications. Notably, UFOGen stands among the pioneering models enabling one-step text-to-image generation and diverse downstream tasks, presenting a significant advancement in the landscape of efficient generative models.
翻译:文本到图像扩散模型在将文本提示转化为连贯图像方面展现了卓越能力,但其推理过程的计算成本仍然是一个持续挑战。为解决这一问题,我们提出了UFOGen,一种专为超快速一步式文本到图像合成设计的新型生成模型。与专注于改进采样器或采用蒸馏技术处理扩散模型的传统方法不同,UFOGen采用混合方法,将扩散模型与GAN目标相结合。利用新引入的扩散-GAN目标以及预训练扩散模型的初始化,UFOGen能够高效地通过单步生成基于文本描述的高质量图像。除了传统的文本到图像生成,UFOGen还在多种应用中展示了其多功能性。值得注意的是,UFOGen是开创性地实现一步式文本到图像生成及多样化下游任务的模型之一,代表了高效生成模型领域的重要进展。