Score-based distillation methods (e.g., variational score distillation) train one-step diffusion models by first pre-training a teacher score model and then distilling it into a one-step student model. However, the gradient estimator in the distillation stage usually suffers from two sources of bias: (1) biased teacher supervision due to score estimation error incurred during pre-training, and (2) the student model's score estimation error during distillation. These biases can degrade the quality of the resulting one-step diffusion model. To address this, we propose DiffRatio, a new framework for training one-step diffusion models: instead of estimating the teacher and student scores independently and then taking their difference, we directly estimate the score difference as the gradient of a learned log density ratio between the student and data distributions across diffusion time steps. This approach greatly simplifies the training pipeline, significantly reduces gradient estimation bias, and improves one-step generation quality. Additionally, it also reduces auxiliary network size by using a lightweight density-ratio network instead of two full score networks, which improves computational and memory efficiency. DiffRatio achieves competitive one-step generation results on CIFAR-10 and ImageNet (64x64 and 512x512), outperforming most teacher-supervised distillation methods. Moreover, the learned density ratio naturally serves as a verifier, enabling a principled inference-time parallel scaling scheme that further improves the generation quality without external rewards or additional sequential computation.
翻译:基于分数的蒸馏方法(如变分分数蒸馏)通过先预训练教师分数模型,再将其蒸馏为一步学生模型来训练一步扩散模型。然而,蒸馏阶段的梯度估计器通常受到两种偏差来源的影响:(1)由于预训练过程中产生的分数估计误差导致的教师监督偏差,以及(2)蒸馏过程中学生模型的分数估计误差。这些偏差会降低最终一步扩散模型的质量。为解决此问题,我们提出DiffRatio,一种训练一步扩散模型的新框架:我们不再独立估计教师和学生分数后计算其差值,而是直接估计分数差,将其视为跨扩散时间步的学生分布与数据分布之间学习到的对数密度比值的梯度。该方法极大地简化了训练流程,显著减少了梯度估计偏差,并提升了一步生成质量。此外,通过使用轻量级密度比网络替代两个完整的分数网络,它还减少了辅助网络规模,从而提高了计算和内存效率。DiffRatio在CIFAR-10和ImageNet(64x64和512x512)上取得了一流的一步生成结果,优于大多数教师监督的蒸馏方法。此外,学习到的密度比自然可作为验证器,实现一种原理性的推理时并行缩放方案,无需外部奖励或额外顺序计算即可进一步提升生成质量。