PaGoDA: Progressive Growing of a One-Step Generator from a Low-Resolution Diffusion Teacher

To accelerate sampling, diffusion models (DMs) are often distilled into generators that directly map noise to data in a single step. In this approach, the resolution of the generator is fundamentally limited by that of the teacher DM. To overcome this limitation, we propose Progressive Growing of Diffusion Autoencoder (PaGoDA), a technique to progressively grow the resolution of the generator beyond that of the original teacher DM. Our key insight is that a pre-trained, low-resolution DM can be used to deterministically encode high-resolution data to a structured latent space by solving the PF-ODE forward in time (data-to-noise), starting from an appropriately down-sampled image. Using this frozen encoder in an auto-encoder framework, we train a decoder by progressively growing its resolution. From the nature of progressively growing decoder, PaGoDA avoids re-training teacher/student models when we upsample the student model, making the whole training pipeline much cheaper. In experiments, we used our progressively growing decoder to upsample from the pre-trained model's 64x64 resolution to generate 512x512 samples, achieving 2x faster inference compared to single-step distilled Stable Diffusion like LCM. PaGoDA also achieved state-of-the-art FIDs on ImageNet across all resolutions from 64x64 to 512x512. Additionally, we demonstrated PaGoDA's effectiveness in solving inverse problems and enabling controllable generation.

翻译：为加速采样过程，扩散模型常被蒸馏为可将噪声直接一步映射至数据的生成器。在此方法中，生成器的分辨率从根本上受限于教师扩散模型的分辨率。为突破此限制，本文提出扩散自编码器的渐进式增长方法（PaGoDA），该技术可使生成器的分辨率渐进式增长至超越原始教师扩散模型的分辨率。我们的核心洞见在于：通过沿时间正向求解PF-ODE（从数据到噪声），预训练的低分辨率扩散模型可将高分辨率数据确定性地编码至结构化潜空间，其初始输入为经过适当下采样的图像。在自编码器框架中利用此冻结编码器，我们通过渐进式增长的方式训练解码器的分辨率。基于解码器的渐进增长特性，PaGoDA在对学生模型进行上采样时可避免重新训练教师/学生模型，从而显著降低整体训练成本。实验中，我们使用渐进增长解码器将预训练模型的64x64分辨率上采样至512x512样本生成，相比LCM等单步蒸馏的Stable Diffusion实现了2倍的推理加速。PaGoDA在ImageNet数据集上从64x64到512x512的所有分辨率均取得了最先进的FID指标。此外，我们验证了PaGoDA在求解逆问题及实现可控生成方面的有效性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日