One-Step Image Translation with Text-to-Image Models

In this work, we address two limitations of existing conditional diffusion models: their slow inference speed due to the iterative denoising process and their reliance on paired data for model fine-tuning. To tackle these issues, we introduce a general method for adapting a single-step diffusion model to new tasks and domains through adversarial learning objectives. Specifically, we consolidate various modules of the vanilla latent diffusion model into a single end-to-end generator network with small trainable weights, enhancing its ability to preserve the input image structure while reducing overfitting. We demonstrate that, for unpaired settings, our model CycleGAN-Turbo outperforms existing GAN-based and diffusion-based methods for various scene translation tasks, such as day-to-night conversion and adding/removing weather effects like fog, snow, and rain. We extend our method to paired settings, where our model pix2pix-Turbo is on par with recent works like Control-Net for Sketch2Photo and Edge2Image, but with a single-step inference. This work suggests that single-step diffusion models can serve as strong backbones for a range of GAN learning objectives. Our code and models are available at https://github.com/GaParmar/img2img-turbo.

翻译：本研究针对现有条件扩散模型的两大局限性：迭代去噪过程导致的推理速度缓慢，以及依赖配对数据进行模型微调。为解决这些问题，我们提出一种通用方法，通过对抗学习目标将单步扩散模型适配至新任务与新领域。具体而言，我们将原始潜在扩散模型中的多个模块整合为单个端到端生成器网络（仅含少量可训练权重），在增强其保留输入图像结构能力的同时减少过拟合。在非配对设定下，我们的CycleGAN-Turbo模型在多种场景翻译任务（如昼夜转换、天气效果增减如雾/雪/雨）中优于现有基于GAN和扩散模型的方法。在配对设定下，我们的pix2pix-Turbo模型在Sketch2Photo与Edge2Image等任务中与Control-Net等近期工作性能相当，但仅需单步推理。本研究表明，单步扩散模型可成为多种GAN学习目标的有效骨干网络。我们的代码与模型已开源至https://github.com/GaParmar/img2img-turbo。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日