In this work, we address two limitations of existing conditional diffusion models: their slow inference speed due to the iterative denoising process and their reliance on paired data for model fine-tuning. To tackle these issues, we introduce a general method for adapting a single-step diffusion model to new tasks and domains through adversarial learning objectives. Specifically, we consolidate various modules of the vanilla latent diffusion model into a single end-to-end generator network with small trainable weights, enhancing its ability to preserve the input image structure while reducing overfitting. We demonstrate that, for unpaired settings, our model CycleGAN-Turbo outperforms existing GAN-based and diffusion-based methods for various scene translation tasks, such as day-to-night conversion and adding/removing weather effects like fog, snow, and rain. We extend our method to paired settings, where our model pix2pix-Turbo is on par with recent works like Control-Net for Sketch2Photo and Edge2Image, but with a single-step inference. This work suggests that single-step diffusion models can serve as strong backbones for a range of GAN learning objectives. Our code and models are available at https://github.com/GaParmar/img2img-turbo.
翻译:本研究针对现有条件扩散模型的两大局限性:迭代去噪过程导致的推理速度缓慢,以及依赖配对数据进行模型微调。为解决这些问题,我们提出一种通用方法,通过对抗学习目标将单步扩散模型适配至新任务与新领域。具体而言,我们将原始潜在扩散模型中的多个模块整合为单个端到端生成器网络(仅含少量可训练权重),在增强其保留输入图像结构能力的同时减少过拟合。在非配对设定下,我们的CycleGAN-Turbo模型在多种场景翻译任务(如昼夜转换、天气效果增减如雾/雪/雨)中优于现有基于GAN和扩散模型的方法。在配对设定下,我们的pix2pix-Turbo模型在Sketch2Photo与Edge2Image等任务中与Control-Net等近期工作性能相当,但仅需单步推理。本研究表明,单步扩散模型可成为多种GAN学习目标的有效骨干网络。我们的代码与模型已开源至https://github.com/GaParmar/img2img-turbo。