We present Diffusion Model Patching (DMP), a simple method to boost the performance of pre-trained diffusion models that have already reached convergence, with a negligible increase in parameters. DMP inserts a small, learnable set of prompts into the model's input space while keeping the original model frozen. The effectiveness of DMP is not merely due to the addition of parameters but stems from its dynamic gating mechanism, which selects and combines a subset of learnable prompts at every timestep (i.e., reverse denoising steps). This strategy, which we term "mixture-of-prompts", enables the model to draw on the distinct expertise of each prompt, essentially "patching" the model's functionality at every timestep with minimal yet specialized parameters. Uniquely, DMP enhances the model by further training on the original dataset already used for pre-training, even in a scenario where significant improvements are typically not expected due to model convergence. Notably, DMP significantly enhances the FID of converged DiT-L/2 by 10.38% on FFHQ, achieved with only a 1.43% parameter increase and 50K additional training iterations.
翻译:本文提出扩散模型补丁法,一种通过可忽略的参数增量提升已收敛预训练扩散模型性能的简易方法。DMP在保持原始模型冻结的同时,向模型输入空间插入一组小型可学习的提示向量。该方法的有效性不仅源于参数增加,更关键的是其动态门控机制——该机制在每步时间步(即反向去噪步骤)中动态选择并组合部分可学习提示。这种被我们称为"提示混合"的策略,使模型能够调用各提示向量的独特专长,从而以极少量专用参数在每步时间步中对模型功能实现"补丁"。值得注意的是,DMP通过在预训练使用的原始数据集上进行进一步训练来增强模型,即使在模型收敛后通常难以获得显著提升的场景下依然有效。实验表明,DMP仅增加1.43%的参数和5万次额外训练迭代,即在FFHQ数据集上将已收敛DiT-L/2模型的FID指标显著提升10.38%。