Diffusion models have become a leading method for generative modeling of both image and scientific data. As these models are costly to train and evaluate, reducing the inference cost for diffusion models remains a major goal. Inspired by the recent empirical success in accelerating diffusion models via the parallel sampling technique~\cite{shih2024parallel}, we propose to divide the sampling process into $\mathcal{O}(1)$ blocks with parallelizable Picard iterations within each block. Rigorous theoretical analysis reveals that our algorithm achieves $\widetilde{\mathcal{O}}(\mathrm{poly} \log d)$ overall time complexity, marking the first implementation with provable sub-linear complexity w.r.t. the data dimension $d$. Our analysis is based on a generalized version of Girsanov's theorem and is compatible with both the SDE and probability flow ODE implementations. Our results shed light on the potential of fast and efficient sampling of high-dimensional data on fast-evolving modern large-memory GPU clusters.
翻译:扩散模型已成为图像与科学数据生成建模的主流方法。由于此类模型的训练与评估成本高昂,降低其推理开销仍是重要目标。受近期通过并行采样技术加速扩散模型的实证成果启发,我们提出将采样过程划分为$\mathcal{O}(1)$个块,每个块内采用可并行的皮卡迭代。严格的理论分析表明,该算法实现了$\widetilde{\mathcal{O}}(\mathrm{poly} \log d)$的整体时间复杂度,这是首个在数据维度$d$上具有可证明次线性复杂度的实现方案。我们的分析基于广义版本的Girsanov定理,且同时兼容SDE与概率流ODE两种实现方式。研究结果揭示了在快速演进的大内存GPU集群上实现高维数据快速高效采样的潜力。