Diffusion models have shown great promise for image and video generation, but sampling from state-of-the-art models requires expensive numerical integration of a generative ODE. One approach for tackling this problem is rectified flows, which iteratively learn smooth ODE paths that are less susceptible to truncation error. However, rectified flows still require a relatively large number of function evaluations (NFEs). In this work, we propose improved techniques for training rectified flows, allowing them to compete with knowledge distillation methods even in the low NFE setting. Our main insight is that under realistic settings, a single iteration of the Reflow algorithm for training rectified flows is sufficient to learn nearly straight trajectories; hence, the current practice of using multiple Reflow iterations is unnecessary. We thus propose techniques to improve one-round training of rectified flows, including a U-shaped timestep distribution and LPIPS-Huber premetric. With these techniques, we improve the FID of the previous 2-rectified flow by up to 72% in the 1 NFE setting on CIFAR-10. On ImageNet 64$\times$64, our improved rectified flow outperforms the state-of-the-art distillation methods such as consistency distillation and progressive distillation in both one-step and two-step settings and rivals the performance of improved consistency training (iCT) in FID. Code is available at https://github.com/sangyun884/rfpp.
翻译:扩散模型在图像和视频生成方面展现出巨大潜力,但当前最先进模型的采样过程需要对生成式常微分方程进行昂贵的数值积分。整流流是解决该问题的一种方法,它通过迭代学习平滑的常微分方程路径,从而降低截断误差的敏感性。然而,整流流仍需要相对较多的函数评估次数。本研究提出改进的整流流训练技术,使其在低函数评估次数设置下仍能与知识蒸馏方法竞争。我们的核心发现是:在实际场景中,用于训练整流流的Reflow算法仅需单次迭代即可学习到近似直线的轨迹;因此,当前采用多次Reflow迭代的训练方式并非必要。基于此,我们提出了改进单轮整流流训练的技术,包括U形时间步分布和LPIPS-Huber预度量方法。采用这些技术后,在CIFAR-10数据集上,我们将现有2-整流流在单次函数评估设置下的FID指标最高提升了72%。在ImageNet 64×64数据集上,改进后的整流流在单步和两步生成设置中均优于一致性蒸馏、渐进蒸馏等最先进的蒸馏方法,并在FID指标上与改进一致性训练(iCT)性能相当。代码发布于https://github.com/sangyun884/rfpp。