Consistency models (CMs) are a powerful class of diffusion-based generative models optimized for fast sampling. Most existing CMs are trained using discretized timesteps, which introduce additional hyperparameters and are prone to discretization errors. While continuous-time formulations can mitigate these issues, their success has been limited by training instability. To address this, we propose a simplified theoretical framework that unifies previous parameterizations of diffusion models and CMs, identifying the root causes of instability. Based on this analysis, we introduce key improvements in diffusion process parameterization, network architecture, and training objectives. These changes enable us to train continuous-time CMs at an unprecedented scale, reaching 1.5B parameters on ImageNet 512x512. Our proposed training algorithm, using only two sampling steps, achieves FID scores of 2.06 on CIFAR-10, 1.48 on ImageNet 64x64, and 1.88 on ImageNet 512x512, narrowing the gap in FID scores with the best existing diffusion models to within 10%.
翻译:一致性模型(CMs)是一类基于扩散的生成模型,专为快速采样优化。现有的一致性模型大多采用离散时间步进行训练,这引入了额外的超参数且易产生离散化误差。虽然连续时间公式可以缓解这些问题,但其成功一直受限于训练不稳定性。为解决此问题,我们提出了一个简化的理论框架,统一了先前扩散模型与一致性模型的参数化方法,并识别了不稳定性的根本原因。基于此分析,我们在扩散过程参数化、网络架构和训练目标方面引入了关键改进。这些改进使我们能够以前所未有的规模训练连续时间一致性模型,在ImageNet 512x512数据集上达到了15亿参数。我们提出的训练算法仅使用两个采样步骤,在CIFAR-10上实现了2.06的FID分数,在ImageNet 64x64上实现了1.48,在ImageNet 512x512上实现了1.88,将FID分数与现有最佳扩散模型的差距缩小至10%以内。