Score-based distillation methods (e.g., variational score distillation) train one-step diffusion models by first pre-training a teacher score model and then distilling it into a one-step student model. However, the gradient estimator in the distillation stage usually suffers from two sources of bias: (1) biased teacher supervision due to score estimation error incurred during pre-training, and (2) the student model's score estimation error during distillation. These biases can degrade the quality of the resulting one-step diffusion model. To address this, we propose DiffRatio, a new framework for training one-step diffusion models: instead of estimating the teacher and student scores independently and then taking their difference, we directly estimate the score difference as the gradient of a learned log density ratio between the student and data distributions across diffusion time steps. This approach greatly simplifies the training pipeline, significantly reduces gradient estimation bias, and improves one-step generation quality. Additionally, it also reduces auxiliary network size by using a lightweight density-ratio network instead of two full score networks, which improves computational and memory efficiency. DiffRatio achieves competitive one-step generation results on CIFAR-10 and ImageNet (64x64 and 512x512), outperforming most teacher-supervised distillation approaches.
翻译:基于分数的蒸馏方法(如变分分数蒸馏)通过先预训练教师分数模型,再将其蒸馏为一步学生模型来训练一步扩散模型。然而,蒸馏阶段的梯度估计器通常受到两种偏差源的影响:(1)由于预训练过程中产生的分数估计误差导致的教师监督偏差;(2)蒸馏过程中学生模型的分数估计误差。这些偏差会降低最终一步扩散模型的质量。为解决此问题,我们提出DiffRatio,一种训练一步扩散模型的新框架:该方法不再独立估计教师和学生分数后计算其差值,而是直接通过估计学生分布与数据分布之间在扩散时间步上的对数密度比梯度来获得分数差。这一方法极大地简化了训练流程,显著降低了梯度估计偏差,并提升了一步生成质量。此外,通过使用轻量级密度比网络替代两个完整的分数网络,该方法还减少了辅助网络规模,从而提高了计算和内存效率。DiffRatio在CIFAR-10和ImageNet(64x64与512x512分辨率)数据集上取得具有竞争力的一步生成结果,其性能优于大多数基于教师监督的蒸馏方法。