To generate data from trained diffusion models, most inference algorithms, such as DDPM, DDIM, and other variants, rely on discretizing the reverse SDEs or their equivalent ODEs. In this paper, we view such approaches as decomposing the entire denoising diffusion process into several segments, each corresponding to a reverse transition kernel (RTK) sampling subproblem. Specifically, DDPM uses a Gaussian approximation for the RTK, resulting in low per-subproblem complexity but requiring a large number of segments (i.e., subproblems), which is conjectured to be inefficient. To address this, we develop a general RTK framework that enables a more balanced subproblem decomposition, resulting in $\tilde O(1)$ subproblems, each with strongly log-concave targets. We then propose leveraging two fast sampling algorithms, the Metropolis-Adjusted Langevin Algorithm (MALA) and Underdamped Langevin Dynamics (ULD), for solving these strongly log-concave subproblems. This gives rise to the RTK-MALA and RTK-ULD algorithms for diffusion inference. In theory, we further develop the convergence guarantees for RTK-MALA and RTK-ULD in total variation (TV) distance: RTK-ULD can achieve $\epsilon$ target error within $\tilde{\mathcal O}(d^{1/2}\epsilon^{-1})$ under mild conditions, and RTK-MALA enjoys a $\mathcal{O}(d^{2}\log(d/\epsilon))$ convergence rate under slightly stricter conditions. These theoretical results surpass the state-of-the-art convergence rates for diffusion inference and are well supported by numerical experiments.
翻译:从训练好的扩散模型中生成数据时,大多数推理算法(如DDPM、DDIM及其变体)依赖于对反向随机微分方程或其等效常微分方程进行离散化。本文将这些方法视为将整个去噪扩散过程分解为若干片段,每个片段对应一个反向转移核采样子问题。具体而言,DDPM对RTK采用高斯近似,这导致每个子问题的复杂度较低,但需要大量片段(即子问题),这被推测为低效的。为解决此问题,我们开发了一个通用的RTK框架,能够实现更平衡的子问题分解,从而得到$\tilde O(1)$个子问题,每个子问题的目标函数均为强对数凹函数。随后,我们提出利用两种快速采样算法——Metropolis-Adjusted Langevin Algorithm(MALA)和Underdamped Langevin Dynamics(ULD)——来求解这些强对数凹子问题。这催生了用于扩散推理的RTK-MALA和RTK-ULD算法。理论上,我们进一步为RTK-MALA和RTK-ULD在总变差距离下的收敛性提供了保证:在温和条件下,RTK-ULD可实现$\epsilon$目标误差,其复杂度为$\tilde{\mathcal O}(d^{1/2}\epsilon^{-1})$;而在稍严格条件下,RTK-MALA享有$\mathcal{O}(d^{2}\log(d/\epsilon))$的收敛速率。这些理论结果超越了当前扩散推理领域的最优收敛速率,并得到了数值实验的有力支持。