Score-matching generative models have proven successful at sampling from complex high-dimensional data distributions. In many applications, this distribution is believed to concentrate on a much lower $d$-dimensional manifold embedded into $D$-dimensional space; this is known as the manifold hypothesis. The current best-known convergence guarantees are either linear in $D$ or polynomial (superlinear) in $d$. The latter exploits a novel integration scheme for the backward SDE. We take the best of both worlds and show that the number of steps diffusion models require in order to converge in Kullback-Leibler~(KL) divergence is linear (up to logarithmic terms) in the intrinsic dimension $d$. Moreover, we show that this linear dependency is sharp.
翻译:基于分数匹配的生成模型已被证明能够成功地从复杂的高维数据分布中采样。在许多应用中,人们认为该分布集中于嵌入到$D$维空间中的一个低得多的$d$维流形上;这被称为流形假设。目前已知的最佳收敛保证要么是$D$的线性函数,要么是$d$的多项式(超线性)函数。后者利用了反向随机微分方程的一种新颖积分方案。我们综合了两者的优势,证明了扩散模型在Kullback-Leibler~(KL)散度意义下收敛所需的步数在本质维度$d$上是线性的(直至对数项)。此外,我们证明了这种线性依赖关系是尖锐的。