We develop an analytical framework for understanding how the generated distribution evolves during diffusion model training. Leveraging a Gaussian-equivalence principle, we solve the full-batch gradient-flow dynamics of linear and convolutional denoisers and integrate the resulting probability-flow ODE, yielding analytic expressions for the generated distribution. The theory exposes a universal inverse-variance spectral law: the time for an eigen- or Fourier mode to match its target variance scales as $τ\proptoλ^{-1}$, so high-variance (coarse) structure is mastered orders of magnitude sooner than low-variance (fine) detail. Extending the analysis to deep linear networks and circulant full-width convolutions shows that weight sharing merely multiplies learning rates -- accelerating but not eliminating the bias -- whereas local convolution introduces a qualitatively different bias. Experiments on Gaussian and natural-image datasets confirm the spectral law persists in deep MLP-based UNet. Convolutional U-Nets, however, display rapid near-simultaneous emergence of many modes, implicating local convolution in reshaping learning dynamics. These results underscore how data covariance governs the order and speed with which diffusion models learn, and they call for deeper investigation of the unique inductive biases introduced by local convolution.
翻译:我们建立了一个分析框架,用于理解扩散模型训练过程中生成分布如何演化。利用高斯等价原理,我们求解了线性和卷积去噪器的全批梯度流动力学,并积分得到概率流常微分方程,从而给出了生成分布的解析表达式。该理论揭示了一个普适的逆方差谱定律:本征模或傅里叶模式达到目标方差所需的时间满足 $τ\proptoλ^{-1}$,因此高方差(粗糙)结构的学习速度比低方差(精细)细节快数个数量级。将分析扩展到深度线性网络和循环全宽卷积表明,权值共享仅线性放大学习率——加速但未消除偏差——而局部卷积则引入了性质不同的偏差。在合成高斯数据集和自然图像数据集上的实验证实,基于深度MLP的U-Net中该谱定律依然成立。然而,卷积U-Net表现出多种模式快速准同步涌现的现象,表明局部卷积重塑了学习动力学。这些结果强调了数据协方差如何主导扩散模型学习的顺序和速度,并呼吁对局部卷积引入的独特归纳偏置进行更深入研究。