Denoising Task Routing for Diffusion Models

Diffusion models generate highly realistic images by learning a multi-step denoising process, naturally embodying the principles of multi-task learning (MTL). Despite the inherent connection between diffusion models and MTL, there remains an unexplored area in designing neural architectures that explicitly incorporate MTL into the framework of diffusion models. In this paper, we present Denoising Task Routing (DTR), a simple add-on strategy for existing diffusion model architectures to establish distinct information pathways for individual tasks within a single architecture by selectively activating subsets of channels in the model. What makes DTR particularly compelling is its seamless integration of prior knowledge of denoising tasks into the framework: (1) Task Affinity: DTR activates similar channels for tasks at adjacent timesteps and shifts activated channels as sliding windows through timesteps, capitalizing on the inherent strong affinity between tasks at adjacent timesteps. (2) Task Weights: During the early stages (higher timesteps) of the denoising process, DTR assigns a greater number of task-specific channels, leveraging the insight that diffusion models prioritize reconstructing global structure and perceptually rich contents in earlier stages, and focus on simple noise removal in later stages. Our experiments reveal that DTR not only consistently boosts diffusion models' performance across different evaluation protocols without adding extra parameters but also accelerates training convergence. Finally, we show the complementarity between our architectural approach and existing MTL optimization techniques, providing a more complete view of MTL in the context of diffusion training. Significantly, by leveraging this complementarity, we attain matched performance of DiT-XL using the smaller DiT-L with a reduction in training iterations from 7M to 2M.

翻译：扩散模型通过学习多步去噪过程生成高度逼真的图像，天然体现了多任务学习（MTL）的原理。尽管扩散模型与MTL之间存在内在联系，但在设计明确将MTL纳入扩散模型框架的神经架构方面仍存在未探索的领域。本文提出去噪任务路由（DTR），这是一种针对现有扩散模型架构的简单附加策略，通过选择性激活模型中的通道子集，在单一架构内为各个任务建立独立的信息通路。DTR的独特之处在于其将去噪任务的先验知识无缝集成到框架中：（1）任务亲和性：DTR为相邻时间步的任务激活相似通道，并以滑动窗口方式通过时间步移动激活通道，充分利用相邻时间步任务间固有的强亲和性。（2）任务权重：在去噪过程的早期阶段（较高时间步），DTR分配更多任务特定通道，利用扩散模型在早期优先重建全局结构和感知丰富内容、在后期专注于简单噪声去除的洞察。实验表明，DTR不仅能在不增加额外参数的情况下，跨不同评估协议持续提升扩散模型性能，还能加速训练收敛。最后，我们展示了架构方法与现有MTL优化技术之间的互补性，为扩散训练中的MTL提供了更完整的视角。值得注意的是，通过利用这种互补性，我们使用较小的DiT-L模型实现了DiT-XL的匹配性能，训练迭代次数从7M减少至2M。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日