Consistency models, which were proposed to mitigate the high computational overhead during the sampling phase of diffusion models, facilitate single-step sampling while attaining state-of-the-art empirical performance. When integrated into the training phase, consistency models attempt to train a sequence of consistency functions capable of mapping any point at any time step of the diffusion process to its starting point. Despite the empirical success, a comprehensive theoretical understanding of consistency training remains elusive. This paper takes a first step towards establishing theoretical underpinnings for consistency models. We demonstrate that, in order to generate samples within $\varepsilon$ proximity to the target in distribution (measured by some Wasserstein metric), it suffices for the number of steps in consistency learning to exceed the order of $d^{5/2}/\varepsilon$, with $d$ the data dimension. Our theory offers rigorous insights into the validity and efficacy of consistency models, illuminating their utility in downstream inference tasks.
翻译:一致性模型旨在缓解扩散模型采样阶段的高计算开销,通过单步采样即可实现最先进的实证性能。在训练阶段,一致性模型尝试训练一系列一致性函数,这些函数能够将扩散过程中任意时间步的任意点映射回其起始点。尽管取得了实证成功,但关于一致性训练的全面理论理解仍难以捉摸。本文首次为建立一致性模型的理论基础迈出关键一步。我们证明:为生成与目标分布在Wasserstein度量下误差不超过ε的样本,一致性学习所需步数只需超过阶d^{5/2}/ε(其中d为数据维度)。该理论为一致性模型的有效性与正确性提供了严谨的数学洞见,阐明了其在下游推理任务中的实用价值。