How do score-based generative models (SBMs) learn the data distribution supported on a low-dimensional manifold? We investigate the score model of a trained SBM through its linear approximations and subspaces spanned by local feature vectors. During diffusion as the noise decreases, the local dimensionality increases and becomes more varied between different sample sequences. Importantly, we find that the learned vector field mixes samples by a non-conservative field within the manifold, although it denoises with normal projections as if there is an energy function in off-manifold directions. At each noise level, the subspace spanned by the local features overlap with an effective density function. These observations suggest that SBMs can flexibly mix samples with the learned score field while carefully maintaining a manifold-like structure of the data distribution.
翻译:基于分数的生成模型如何学习支撑在低维流形上的数据分布?我们通过线性近似及局部特征向量所张成的子空间,研究训练后SBM的得分模型。随着扩散过程中噪声的减小,局部维数增加,并在不同样本序列之间呈现更大的差异性。重要的是,我们发现学习到的向量场在流形内部通过非保守场混合样本,尽管其去噪过程(如同存在能量函数作用于流形外方向时)采用法向投影。在每个噪声水平下,由局部特征所张成的子空间与有效密度函数存在重叠。这些观察表明,SBM能够利用学习到的得分场灵活地混合样本,同时谨慎地保持数据分布的流形状结构。