Denoising Task Routing for Diffusion Models

Diffusion models generate highly realistic images by learning a multi-step denoising process, naturally embodying the principles of multi-task learning (MTL). Despite the inherent connection between diffusion models and MTL, there remains an unexplored area in designing neural architectures that explicitly incorporate MTL into the framework of diffusion models. In this paper, we present Denoising Task Routing (DTR), a simple add-on strategy for existing diffusion model architectures to establish distinct information pathways for individual tasks within a single architecture by selectively activating subsets of channels in the model. What makes DTR particularly compelling is its seamless integration of prior knowledge of denoising tasks into the framework: (1) Task Affinity: DTR activates similar channels for tasks at adjacent timesteps and shifts activated channels as sliding windows through timesteps, capitalizing on the inherent strong affinity between tasks at adjacent timesteps. (2) Task Weights: During the early stages (higher timesteps) of the denoising process, DTR assigns a greater number of task-specific channels, leveraging the insight that diffusion models prioritize reconstructing global structure and perceptually rich contents in earlier stages, and focus on simple noise removal in later stages. Our experiments reveal that DTR not only consistently boosts diffusion models' performance across different evaluation protocols without adding extra parameters but also accelerates training convergence. Finally, we show the complementarity between our architectural approach and existing MTL optimization techniques, providing a more complete view of MTL in the context of diffusion training. Significantly, by leveraging this complementarity, we attain matched performance of DiT-XL using the smaller DiT-L with a reduction in training iterations from 7M to 2M.

翻译：扩散模型通过学习多步去噪过程生成高度逼真的图像，这天然体现了多任务学习（MTL）的原则。尽管扩散模型与MTL存在内在关联，但在为扩散模型框架显式融入MTL的神经架构设计方面仍存在未探索领域。本文提出去噪任务路由（DTR），这是一种简单易用的附加策略，通过选择性激活模型中的通道子集，在单一架构内为不同任务建立独立的信息通路。DTR的特别之处在于其能够无缝整合去噪任务的先验知识：（1）任务亲和性：DTR为相邻时间步的任务激活相似通道，并以滑动窗口方式随时间步移动激活通道，充分利用相邻时间步任务间固有的强亲和性。（2）任务权重：在去噪过程早期阶段（较高时间步），DTR分配更多任务特定通道，其依据是扩散模型早期优先重建全局结构和感知丰富内容，后期则聚焦于简单噪声去除。实验表明，DTR不仅能在不增加额外参数的情况下持续提升扩散模型在不同评估协议下的性能，还能加速训练收敛。最后，我们展示了架构方法与现有MTL优化技术之间的互补性，为扩散训练中的MTL提供了更全面的视角。值得注意的是，通过利用这种互补性，我们使用更小的DiT-L模型实现了DiT-XL的匹配性能，同时将训练迭代次数从7M减少到2M。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日