Learning Few-Step Diffusion Models by Trajectory Distribution Matching

Accelerating diffusion model sampling is crucial for efficient AIGC deployment. While diffusion distillation methods -- based on distribution matching and trajectory matching -- reduce sampling to as few as one step, they fall short on complex tasks like text-to-image generation. Few-step generation offers a better balance between speed and quality, but existing approaches face a persistent trade-off: distribution matching lacks flexibility for multi-step sampling, while trajectory matching often yields suboptimal image quality. To bridge this gap, we propose learning few-step diffusion models by Trajectory Distribution Matching (TDM), a unified distillation paradigm that combines the strengths of distribution and trajectory matching. Our method introduces a data-free score distillation objective, aligning the student's trajectory with the teacher's at the distribution level. Further, we develop a sampling-steps-aware objective that decouples learning targets across different steps, enabling more adjustable sampling. This approach supports both deterministic sampling for superior image quality and flexible multi-step adaptation, achieving state-of-the-art performance with remarkable efficiency. Our model, TDM, outperforms existing methods on various backbones, such as SDXL and PixArt-$\alpha$, delivering superior quality and significantly reduced training costs. In particular, our method distills PixArt-$\alpha$ into a 4-step generator that outperforms its teacher on real user preference at 1024 resolution. This is accomplished with 500 iterations and 2 A800 hours -- a mere 0.01% of the teacher's training cost. In addition, our proposed TDM can be extended to accelerate text-to-video diffusion. Notably, TDM can outperform its teacher model (CogVideoX-2B) by using only 4 NFE on VBench, improving the total score from 80.91 to 81.65. Project page: https://tdm-t2x.github.io/

翻译：加速扩散模型采样对于高效部署AIGC至关重要。基于分布匹配和轨迹匹配的扩散蒸馏方法可将采样步骤减少至单步，但在文本到图像生成等复杂任务上表现不足。少步生成在速度与质量间提供了更好的平衡，但现有方法面临持续权衡：分布匹配缺乏多步采样的灵活性，而轨迹匹配往往产生次优图像质量。为弥合这一差距，我们提出通过轨迹分布匹配学习少步扩散模型，这是一种融合分布匹配与轨迹匹配优势的统一蒸馏范式。我们的方法引入无数据分数蒸馏目标，在分布层级对齐学生轨迹与教师轨迹。进一步，我们开发了采样步数感知目标，将不同步骤的学习目标解耦，实现更可调节的采样。该方法同时支持确定性采样以获得卓越图像质量，以及灵活的多步适应能力，以显著效率实现最先进性能。我们的TDM模型在SDXL和PixArt-$\alpha$等多种骨干网络上超越现有方法，提供更优质量并大幅降低训练成本。特别地，我们的方法将PixArt-$\alpha$蒸馏为4步生成器，在1024分辨率真实用户偏好评估中超越其教师模型。这仅需500次迭代和2个A800 GPU小时——仅占教师模型训练成本的0.01%。此外，所提出的TDM可扩展至加速文本到视频扩散。值得注意的是，TDM在VBench基准上仅用4次NFE即可超越其教师模型（CogVideoX-2B），将总分从80.91提升至81.65。项目页面：https://tdm-t2x.github.io/

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日