Diffusion models generate highly realistic images by learning a multi-step denoising process, naturally embodying the principles of multi-task learning (MTL). Despite the inherent connection between diffusion models and MTL, there remains an unexplored area in designing neural architectures that explicitly incorporate MTL into the framework of diffusion models. In this paper, we present Denoising Task Routing (DTR), a simple add-on strategy for existing diffusion model architectures to establish distinct information pathways for individual tasks within a single architecture by selectively activating subsets of channels in the model. What makes DTR particularly compelling is its seamless integration of prior knowledge of denoising tasks into the framework: (1) Task Affinity: DTR activates similar channels for tasks at adjacent timesteps and shifts activated channels as sliding windows through timesteps, capitalizing on the inherent strong affinity between tasks at adjacent timesteps. (2) Task Weights: During the early stages (higher timesteps) of the denoising process, DTR assigns a greater number of task-specific channels, leveraging the insight that diffusion models prioritize reconstructing global structure and perceptually rich contents in earlier stages, and focus on simple noise removal in later stages. Our experiments reveal that DTR not only consistently boosts diffusion models' performance across different evaluation protocols without adding extra parameters but also accelerates training convergence. Finally, we show the complementarity between our architectural approach and existing MTL optimization techniques, providing a more complete view of MTL in the context of diffusion training. Significantly, by leveraging this complementarity, we attain matched performance of DiT-XL using the smaller DiT-L with a reduction in training iterations from 7M to 2M.
翻译:扩散模型通过学习多步去噪过程生成高度逼真的图像,这天然体现了多任务学习(MTL)的原则。尽管扩散模型与MTL存在内在关联,但在为扩散模型框架显式融入MTL的神经架构设计方面仍存在未探索领域。本文提出去噪任务路由(DTR),这是一种简单易用的附加策略,通过选择性激活模型中的通道子集,在单一架构内为不同任务建立独立的信息通路。DTR的特别之处在于其能够无缝整合去噪任务的先验知识:(1)任务亲和性:DTR为相邻时间步的任务激活相似通道,并以滑动窗口方式随时间步移动激活通道,充分利用相邻时间步任务间固有的强亲和性。(2)任务权重:在去噪过程早期阶段(较高时间步),DTR分配更多任务特定通道,其依据是扩散模型早期优先重建全局结构和感知丰富内容,后期则聚焦于简单噪声去除。实验表明,DTR不仅能在不增加额外参数的情况下持续提升扩散模型在不同评估协议下的性能,还能加速训练收敛。最后,我们展示了架构方法与现有MTL优化技术之间的互补性,为扩散训练中的MTL提供了更全面的视角。值得注意的是,通过利用这种互补性,我们使用更小的DiT-L模型实现了DiT-XL的匹配性能,同时将训练迭代次数从7M减少到2M。