Diffusion models generate highly realistic images by learning a multi-step denoising process, naturally embodying the principles of multi-task learning (MTL). Despite the inherent connection between diffusion models and MTL, there remains an unexplored area in designing neural architectures that explicitly incorporate MTL into the framework of diffusion models. In this paper, we present Denoising Task Routing (DTR), a simple add-on strategy for existing diffusion model architectures to establish distinct information pathways for individual tasks within a single architecture by selectively activating subsets of channels in the model. What makes DTR particularly compelling is its seamless integration of prior knowledge of denoising tasks into the framework: (1) Task Affinity: DTR activates similar channels for tasks at adjacent timesteps and shifts activated channels as sliding windows through timesteps, capitalizing on the inherent strong affinity between tasks at adjacent timesteps. (2) Task Weights: During the early stages (higher timesteps) of the denoising process, DTR assigns a greater number of task-specific channels, leveraging the insight that diffusion models prioritize reconstructing global structure and perceptually rich contents in earlier stages, and focus on simple noise removal in later stages. Our experiments reveal that DTR not only consistently boosts diffusion models' performance across different evaluation protocols without adding extra parameters but also accelerates training convergence. Finally, we show the complementarity between our architectural approach and existing MTL optimization techniques, providing a more complete view of MTL in the context of diffusion training. Significantly, by leveraging this complementarity, we attain matched performance of DiT-XL using the smaller DiT-L with a reduction in training iterations from 7M to 2M.
翻译:扩散模型通过学习多步去噪过程生成高度逼真的图像,天然体现了多任务学习(MTL)的原理。尽管扩散模型与MTL之间存在内在联系,但在设计明确将MTL纳入扩散模型框架的神经架构方面仍存在未探索的领域。本文提出去噪任务路由(DTR),这是一种针对现有扩散模型架构的简单附加策略,通过选择性激活模型中的通道子集,在单一架构内为各个任务建立独立的信息通路。DTR的独特之处在于其将去噪任务的先验知识无缝集成到框架中:(1)任务亲和性:DTR为相邻时间步的任务激活相似通道,并以滑动窗口方式通过时间步移动激活通道,充分利用相邻时间步任务间固有的强亲和性。(2)任务权重:在去噪过程的早期阶段(较高时间步),DTR分配更多任务特定通道,利用扩散模型在早期优先重建全局结构和感知丰富内容、在后期专注于简单噪声去除的洞察。实验表明,DTR不仅能在不增加额外参数的情况下,跨不同评估协议持续提升扩散模型性能,还能加速训练收敛。最后,我们展示了架构方法与现有MTL优化技术之间的互补性,为扩散训练中的MTL提供了更完整的视角。值得注意的是,通过利用这种互补性,我们使用较小的DiT-L模型实现了DiT-XL的匹配性能,训练迭代次数从7M减少至2M。