When a diffusion model is not memorizing the training data set, how does it generalize exactly? A quantitative understanding of the distribution it generates would be beneficial to, for example, an assessment of the model's performance for downstream applications. We thus explicitly characterize what diffusion model generates, by proposing a log-density ridge manifold and quantifying how the generated data relate to this manifold as inference dynamics progresses. More precisely, inference undergoes a reach-align-slide process centered around the ridge manifold: trajectories first reach a neighborhood of the manifold, then align as being pushed toward or away from the manifold in normal directions, and finally slide along the manifold in tangent directions. Within the scope of this general behavior, different training errors will lead to different normal and tangent motions, which can be quantified, and these detailed motions characterize when inter-mode generations emerge. More detailed understanding of training dynamics will lead to more accurate quantification of the generation inductive bias, and an example of random feature model will be considered, for which we can explicitly illustrate how diffusion model's inductive biases originate as a composition of architectural bias and training accuracy, and how they evolve with the inference dynamics. Experiments on synthetic multimodal distributions and MNIST latent diffusion support the predicted directional effects, in both low- and high-dimensions.
翻译:当扩散模型未记忆训练数据集时,其泛化机制究竟如何?定量理解模型生成的分布对于评估其在下游任务中的性能具有重要意义。为此,我们通过提出对数密度岭流形并量化生成数据随推断动态演进与该流形的关系,来明确刻画扩散模型的生成特性。具体而言,推断过程围绕岭流形经历“抵达-对齐-滑移”三个阶段:轨迹首先抵达流形邻域,随后在法向被推近或推离流形而形成对齐,最终沿切向在流形上滑移。在此通用行为框架内,不同的训练误差将导致不同的法向与切向运动模式,这些可量化的细节运动刻画了模态间生成现象的出现时机。对训练动态更深入的理解将带来对生成归纳偏置的更精确量化,我们将以随机特征模型为例,具体阐释扩散模型的归纳偏置如何源自架构偏置与训练精度的复合作用,并如何随推断动态演化。在合成多模态分布与MNIST潜在扩散模型上的实验,从低维到高维均支持所预测的方向性效应。