This book presents the core principles that have guided the development of diffusion models, tracing their origins and showing how diverse formulations arise from shared mathematical ideas. Diffusion modeling starts by defining a forward process that gradually corrupts data into noise, linking the data distribution to a simple prior through a continuum of intermediate distributions. The goal is to learn a reverse process that transforms noise back into data while recovering the same intermediates. We describe three complementary views. The variational view, inspired by variational autoencoders, sees diffusion as learning to remove noise step by step. The score-based view, rooted in energy-based modeling, learns the gradient of the evolving data distribution, indicating how to nudge samples toward more likely regions. The flow-based view, related to normalizing flows, treats generation as following a smooth path that moves samples from noise to data under a learned velocity field. These perspectives share a common backbone: a time-dependent velocity field whose flow transports a simple prior to the data. Sampling then amounts to solving a differential equation that evolves noise into data along a continuous trajectory. On this foundation, the book discusses guidance for controllable generation, efficient numerical solvers, and diffusion-motivated flow-map models that learn direct mappings between arbitrary times. It provides a conceptual and mathematically grounded understanding of diffusion models for readers with basic deep-learning knowledge.
翻译:本书阐述了指导扩散模型发展的核心原理,追溯其起源,并展示了多种公式化表述如何源自共同的数学思想。扩散建模首先定义了一个前向过程,该过程逐步将数据破坏为噪声,通过一系列中间分布将数据分布与简单先验联系起来。目标是学习一个反向过程,该过程将噪声转换回数据,同时恢复相同的中间分布。我们描述了三种互补的视角。受变分自编码器启发的变分视角将扩散视为逐步学习去除噪声的过程。基于能量建模的得分视角学习演化数据分布的梯度,指示如何将样本推向更可能的区域。与归一化流相关的流形视角将生成过程视为遵循一条平滑路径,在学习的速度场下将样本从噪声移动到数据。这些视角共享一个共同框架:一个时变的速度场,其流形将简单先验传输到数据。因此,采样相当于求解一个微分方程,该方程沿连续轨迹将噪声演化为数据。在此基础之上,本书讨论了可控生成的引导、高效数值求解器以及受扩散启发的流图模型,这些模型学习任意时间之间的直接映射。它为具备基础深度学习知识的读者提供了对扩散模型的概念性和数学性的理解。