Boosting Latent Diffusion with Flow Matching

Visual synthesis has recently seen significant leaps in performance, largely due to breakthroughs in generative models. Diffusion models have been a key enabler, as they excel in image diversity. However, this comes at the cost of slow training and synthesis, which is only partially alleviated by latent diffusion. To this end, flow matching is an appealing approach due to its complementary characteristics of faster training and inference but less diverse synthesis. We demonstrate that introducing flow matching between a frozen diffusion model and a convolutional decoder enables high-resolution image synthesis at reduced computational cost and model size. A small diffusion model can then effectively provide the necessary visual diversity, while flow matching efficiently enhances resolution and detail by mapping the small to a high-dimensional latent space. These latents are then projected to high-resolution images by the subsequent convolutional decoder of the latent diffusion approach. Combining the diversity of diffusion models, the efficiency of flow matching, and the effectiveness of convolutional decoders, state-of-the-art high-resolution image synthesis is achieved at $1024^2$ pixels with minimal computational cost. Further scaling up our method we can reach resolutions up to $2048^2$ pixels. Importantly, our approach is orthogonal to recent approximation and speed-up strategies for the underlying model, making it easily integrable into the various diffusion model frameworks.

翻译：视觉合成领域近期在性能上取得了显著突破，这主要归功于生成模型的重大进展。扩散模型因其在图像多样性方面的卓越表现，已成为关键推动技术。然而，这种优势是以训练与合成速度缓慢为代价的，而潜在扩散模型仅能部分缓解这一问题。为此，流匹配因其具有训练与推理速度更快（尽管合成多样性相对较低）的互补特性，成为一种极具吸引力的方法。我们证明，在冻结的扩散模型与卷积解码器之间引入流匹配，能够以降低的计算成本和模型规模实现高分辨率图像合成。小型扩散模型可有效提供必要的视觉多样性，而流匹配则通过将低维潜在空间映射到高维空间，高效地提升分辨率与细节表现。随后，潜在扩散方法中的卷积解码器将这些潜在表示投影为高分辨率图像。通过融合扩散模型的多样性优势、流匹配的高效性以及卷积解码器的有效性，我们以极低计算成本实现了1024^2像素级别的先进高分辨率图像合成。进一步扩展本方法后，分辨率可提升至2048^2像素。重要的是，我们的方法与近期针对底层模型的近似及加速策略具有正交性，可轻松集成到各类扩散模型框架中。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日