We study a data-dependent notion of diffusion-model generalization: when a model does not memorize the training set, where do its generated samples go relative to the geometry induced by the data? To answer this, we introduce a time-dependent family of log-density ridge manifolds constructed from the smoothed empirical distribution, and use it to characterize reverse-time inference. Our main result shows that generated samples evolve by a reach-align-slide mechanism: they first enter a neighborhood of the ridge, then their distance to the ridge is controlled by the normal component of training error, and finally their motion along the ridge is controlled by the tangential component. We further connect this geometric picture to training dynamics through directional decompositions of the learned error, and make this link explicit for random feature models, where architectural bias and optimization error can be separated quantitatively. Experiments on synthetic multimodal data and MNIST latent diffusion support the predicted geometric behavior in both low and high dimensions.
翻译:我们研究了一种数据依赖的扩散模型泛化概念:当模型并未记忆训练集时,其生成样本相对于数据诱导的几何结构会走向何处?为回答此问题,我们引入一类基于平滑经验分布构建的、随时间变化的对数密度岭流形族,并利用其刻画反向时间推断过程。主要结果表明,生成样本通过"到达-对齐-滑动"机制演化:样本首先进入岭的邻域,随后其与岭的距离由训练误差的法向分量控制,最终沿岭的移动由切向分量主导。我们进一步通过学习误差的方向分解将该几何图像与训练动力学建立联系,并在随机特征模型中实现显式关联——此类模型可定量分离架构偏差与优化误差。基于合成多模态数据及MNIST潜在扩散模型的实验,在低维与高维场景下均验证了所预测的几何行为。