Few-shot fine-tuning of Diffusion Models (DMs) is a key advancement, significantly reducing training costs and enabling personalized AI applications. However, we explore the training dynamics of DMs and observe an unanticipated phenomenon: during the training process, image fidelity initially improves, then unexpectedly deteriorates with the emergence of noisy patterns, only to recover later with severe overfitting. We term the stage with generated noisy patterns as corruption stage. To understand this corruption stage, we begin by theoretically modeling the one-shot fine-tuning scenario, and then extend this modeling to more general cases. Through this modeling, we identify the primary cause of this corruption stage: a narrowed learning distribution inherent in the nature of few-shot fine-tuning. To tackle this, we apply Bayesian Neural Networks (BNNs) on DMs with variational inference to implicitly broaden the learned distribution, and present that the learning target of the BNNs can be naturally regarded as an expectation of the diffusion loss and a further regularization with the pretrained DMs. This approach is highly compatible with current few-shot fine-tuning methods in DMs and does not introduce any extra inference costs. Experimental results demonstrate that our method significantly mitigates corruption, and improves the fidelity, quality and diversity of the generated images in both object-driven and subject-driven generation tasks.
翻译:扩散模型(DMs)的少样本微调是一项关键进展,它能显著降低训练成本并支持个性化AI应用。然而,我们探究了DMs的训练动态,观察到一个未预料到的现象:在训练过程中,图像保真度最初提升,随后却意外恶化并出现噪声模式,直到后期才恢复,但伴有严重的过拟合。我们将生成噪声模式的阶段称为腐化阶段。为理解此腐化阶段,我们首先对单样本微调场景进行理论建模,随后将该建模扩展至更一般的情形。通过此建模,我们确定了腐化阶段的主要原因:少样本微调本身固有的学习分布收窄。为解决此问题,我们在DMs上应用变分推断的贝叶斯神经网络(BNNs),以隐式拓宽学习分布,并指出BNNs的学习目标可自然地视为扩散损失的期望及对预训练DMs的进一步正则化。该方法与当前DMs中的少样本微调方法高度兼容,且不引入任何额外推理成本。实验结果表明,我们的方法显著缓解了腐化,并在物体驱动和主体驱动生成任务中提升了生成图像的保真度、质量与多样性。