We introduce ScoreFusion, a theoretically grounded method for fusing multiple pre-trained diffusion models that are assumed to generate from auxiliary populations. ScoreFusion is particularly useful for enhancing the generative modeling of a target population with limited observed data. Our starting point considers the family of KL barycenters of the auxiliary populations, which is proven to be an optimal parametric class in the KL sense, but difficult to learn. Nevertheless, by recasting the learning problem as score matching in denoising diffusion, we obtain a tractable way of computing the optimal KL barycenter weights. We prove a dimension-free sample complexity bound in total variation distance, provided that the auxiliary models are well fitted for their own task and the auxiliary tasks combined capture the target well. We also explain a connection of the practice of checkpoint merging in AI art creation to an approximation of our KL-barycenter-based fusion approach. However, our fusion method differs in key aspects, allowing generation of new populations, as we illustrate in experiments.
翻译:本文提出ScoreFusion,一种基于理论的多预训练扩散模型融合方法,这些模型被假定为从辅助总体中生成数据。ScoreFusion对于增强数据有限的目标总体的生成建模尤为有效。我们的出发点考虑辅助总体的KL质心族,该族被证明是KL意义下的最优参数类,但难以学习。然而,通过将学习问题重构为去噪扩散中的分数匹配,我们获得了一种计算最优KL质心权重的可行方法。我们证明了在总变差距离下的无维度样本复杂度界,前提是辅助模型在其自身任务上拟合良好,且辅助任务组合能充分捕捉目标特征。我们还解释了AI艺术创作中检查点合并实践与我们基于KL质心的融合方法近似之间的联系。然而,我们的融合方法在关键方面有所不同,能够生成新的总体,正如我们在实验中所展示的。