Diffusion distillation provides an effective approach for learning lightweight and few-steps diffusion models with efficient generation. However, evaluating their generalization remains challenging: theoretical metrics are often impractical for high-dimensional data, while no practical metrics rigorously measure generalization. In this work, we bridge this gap by introducing probability flow distance (\texttt{PFD}), a theoretically grounded and computationally efficient metric to measure generalization. Specifically, \texttt{PFD} quantifies the distance between distributions by comparing their noise-to-data mappings induced by the probability flow ODE. Using \texttt{PFD} under the diffusion distillation setting, we empirically uncover several key generalization behaviors, including: (1) quantitative scaling behavior from memorization to generalization, (2) epoch-wise double descent training dynamics, and (3) bias-variance decomposition. Beyond these insights, our work lays a foundation for generalization studies in diffusion distillation and bridges them with diffusion training.
翻译:扩散蒸馏为学习轻量级、少步数扩散模型提供了高效生成的有效途径。然而,评估其泛化性仍具挑战性:理论度量通常难以适用于高维数据,而现有实用度量又无法严格衡量泛化性。本研究通过引入概率流距离(\texttt{PFD})来弥合这一差距,该度量具有理论依据且计算高效,可用于量化泛化性。具体而言,\texttt{PFD}通过比较概率流常微分方程诱导的噪声到数据映射来度量分布间的距离。在扩散蒸馏设定下应用\texttt{PFD},我们实证揭示了若干关键泛化行为,包括:(1)从记忆到泛化的定量缩放行为,(2)训练过程中周期性的双下降动态,以及(3)偏差-方差分解。除这些发现外,本研究为扩散蒸馏的泛化性研究奠定了基础,并将其与扩散训练理论相衔接。