Score-based Generative Models (SGMs) is one leading method in generative modeling, renowned for their ability to generate high-quality samples from complex, high-dimensional data distributions. The method enjoys empirical success and is supported by rigorous theoretical convergence properties. In particular, it has been shown that SGMs can generate samples from a distribution that is close to the ground-truth if the underlying score function is learned well, suggesting the success of SGM as a generative model. We provide a counter-example in this paper. Through the sample complexity argument, we provide one specific setting where the score function is learned well. Yet, SGMs in this setting can only output samples that are Gaussian blurring of training data points, mimicking the effects of kernel density estimation. The finding resonates a series of recent finding that reveal that SGMs can demonstrate strong memorization effect and fail to generate.
翻译:基于分数的生成模型(SGMs)是生成建模中的领先方法之一,因其能够从复杂的高维数据分布中生成高质量样本而闻名。该方法不仅取得了实证成功,还得到了严格理论收敛性质的支持。特别地,已有研究表明,如果底层分数函数被学习得足够好,SGMs能够生成与真实分布接近的样本,这暗示了SGM作为生成模型的成功。然而,本文提供了一个反例。通过样本复杂度论证,我们设定了一个具体场景,其中分数函数被很好地学习。但在此场景下,SGMs仅能输出训练数据点的高斯模糊版本,其效果类似于核密度估计。这一发现与近期一系列研究结果相呼应,这些研究揭示了SGMs可能表现出强烈的记忆效应,从而无法成功生成新样本。