Score-based diffusion models are a highly effective method for generating samples from a distribution of images. We consider scenarios where the training data comes from a noisy version of the target distribution, and present an efficiently implementable modification of the inference procedure to generate noiseless samples. Our approach is motivated by the manifold hypothesis, according to which meaningful data is concentrated around some low-dimensional manifold of a high-dimensional ambient space. The central idea is that noise manifests as low magnitude variation in off-manifold directions in contrast to the relevant variation of the desired distribution which is mostly confined to on-manifold directions. We introduce the notion of an extended score and show that, in a simplified setting, it can be used to reduce small variations to zero, while leaving large variations mostly unchanged. We describe how its approximation can be computed efficiently from an approximation to the standard score and demonstrate its efficacy on toy problems, synthetic data, and real data.
翻译:基于得分的扩散模型是从图像分布中生成样本的高效方法。我们考虑训练数据来自目标分布含噪版本的情况,并提出一种可高效实现的推理过程修改方案,用于生成无噪声样本。方法的提出受流形假设驱动——该假设认为有意义的数据集中在高维环境空间中的低维流形附近。核心思想在于:噪声表现为离流形方向上的低幅度变化,而目标分布的相关变化主要局限于沿流形方向。我们引入扩展得分的概念,并证明在简化设定下该方法可将微小变化归零,同时基本保持大尺度变化不变。我们阐述了如何通过标准得分的近似值高效计算该近似量,并在玩具问题、合成数据及真实数据上验证其有效性。