High-quality samples generated with score-based reverse diffusion algorithms provide evidence that deep neural networks (DNN) trained for denoising can learn high-dimensional densities, despite the curse of dimensionality. However, recent reports of memorization of the training set raise the question of whether these networks are learning the "true" continuous density of the data. Here, we show that two denoising DNNs trained on non-overlapping subsets of a dataset learn nearly the same score function, and thus the same density, with a surprisingly small number of training images. This strong generalization demonstrates an alignment of powerful inductive biases in the DNN architecture and/or training algorithm with properties of the data distribution. We analyze these, demonstrating that the denoiser performs a shrinkage operation in a basis adapted to the underlying image. Examination of these bases reveals oscillating harmonic structures along contours and in homogeneous image regions. We show that trained denoisers are inductively biased towards these geometry-adaptive harmonic representations by demonstrating that they arise even when the network is trained on image classes such as low-dimensional manifolds, for which the harmonic basis is suboptimal. Additionally, we show that the denoising performance of the networks is near-optimal when trained on regular image classes for which the optimal basis is known to be geometry-adaptive and harmonic.
翻译:基于分数匹配的反向扩散算法能够生成高质量样本,这为训练用于去噪的深度神经网络(DNN)可以学习高维密度提供了证据,尽管存在维度灾难。然而,近期关于训练集记忆化的报告引发了疑问:这些网络是否真的在学习数据的“真实”连续密度?本文中,我们展示了在数据集非重叠子集上训练的两个去噪DNN学习了几乎相同的分数函数,从而学习了几乎相同的密度,且所需训练图像数量出奇地少。这种强泛化能力表明,DNN架构和/或训练算法中的强大归纳偏置与数据分布特性之间存在对齐。我们对此进行了分析,证明了去噪器在适应于底层图像的基上执行了收缩操作。对这些基的检验揭示了沿轮廓和均匀图像区域中的振荡谐波结构。我们证明,训练好的去噪器在归纳上偏向于这些几何自适应谐波表示,即使网络在低维流形等图像类上训练(此时谐波基并非最优),这些表示仍然会出现。此外,我们展示了当网络在正则图像类上训练时(已知此类图像的最优基是几何自适应且谐波的),网络的去噪性能接近最优。