Diffusion distillation represents a highly promising direction for achieving faithful text-to-image generation in a few sampling steps. However, despite recent successes, existing distilled models still do not provide the full spectrum of diffusion abilities, such as real image inversion, which enables many precise image manipulation methods. This work aims to enrich distilled text-to-image diffusion models with the ability to effectively encode real images into their latent space. To this end, we introduce invertible Consistency Distillation (iCD), a generalized consistency distillation framework that facilitates both high-quality image synthesis and accurate image encoding in only 3-4 inference steps. Though the inversion problem for text-to-image diffusion models gets exacerbated by high classifier-free guidance scales, we notice that dynamic guidance significantly reduces reconstruction errors without noticeable degradation in generation performance. As a result, we demonstrate that iCD equipped with dynamic guidance may serve as a highly effective tool for zero-shot text-guided image editing, competing with more expensive state-of-the-art alternatives.
翻译:扩散蒸馏为实现高保真的少步采样文本到图像生成提供了极具前景的方向。然而,尽管近期取得了一些成功,现有蒸馏模型仍未能提供扩散模型的完整能力谱系,例如实现多种精确图像处理方法的真实图像反演。本研究旨在为蒸馏后的文本到图像扩散模型赋予将真实图像有效编码至其潜在空间的能力。为此,我们提出了可逆一致性蒸馏(iCD),这是一种广义的一致性蒸馏框架,仅需3-4步推理即可同时实现高质量图像合成与精确图像编码。虽然高分类器无关引导尺度会加剧文本到图像扩散模型的反演问题,但我们发现动态引导能显著降低重建误差,且不会导致生成性能明显下降。因此,我们证明配备动态引导的iCD可成为零样本文本引导图像编辑的高效工具,其性能可与更昂贵的先进替代方案相媲美。