Consistency models (CMs) are an emerging class of generative models that offer faster sampling than traditional diffusion models. CMs enforce that all points along a sampling trajectory are mapped to the same initial point. But this target leads to resource-intensive training: for example, as of 2024, training a SoTA CM on CIFAR-10 takes one week on 8 GPUs. In this work, we propose an alternative scheme for training CMs, vastly improving the efficiency of building such models. Specifically, by expressing CM trajectories via a particular differential equation, we argue that diffusion models can be viewed as a special case of CMs with a specific discretization. We can thus fine-tune a consistency model starting from a pre-trained diffusion model and progressively approximate the full consistency condition to stronger degrees over the training process. Our resulting method, which we term Easy Consistency Tuning (ECT), achieves vastly improved training times while indeed improving upon the quality of previous methods: for example, ECT achieves a 2-step FID of 2.73 on CIFAR10 within 1 hour on a single A100 GPU, matching Consistency Distillation trained of hundreds of GPU hours. Owing to this computational efficiency, we investigate the scaling law of CMs under ECT, showing that they seem to obey classic power law scaling, hinting at their ability to improve efficiency and performance at larger scales. Code (https://github.com/locuslab/ect) is available.
翻译:一致性模型(CMs)是一类新兴的生成模型,其采样速度较传统扩散模型更快。CMs强制要求采样轨迹上的所有点都映射至同一初始点。但这一目标会导致资源密集型的训练:例如截至2024年,在CIFAR-10数据集上训练一个前沿CM需在8个GPU上耗时一周。本研究提出一种替代性的CM训练方案,极大提升了构建此类模型的效率。具体而言,通过用特定微分方程描述CM轨迹,我们论证了扩散模型可视为采用特定离散化形式的CM特例。因此,我们可以从预训练的扩散模型出发进行微调,并在训练过程中逐步以更强程度逼近完整的一致性条件。我们将所提出的方法命名为简易一致性调优(ECT),在显著提升训练效率的同时,其生成质量亦超越先前方法:例如,ECT在单张A100 GPU上仅用1小时即在CIFAR-10数据集实现2步FID为2.73的效果,匹配了需数百GPU小时训练的Consistency Distillation方法。得益于这种计算效率,我们研究了ECT框架下CM的缩放规律,发现其遵循经典的幂律缩放定律,预示着在大规模场景下提升效率与性能的潜力。代码(https://github.com/locuslab/ect)已开源。