Score diffusion methods can learn probability densities from samples. The score of the noise-corrupted density is estimated using a deep neural network, which is then used to iteratively transport a Gaussian white noise density to a target density. Variants for conditional densities have been developed, but correct estimation of the corresponding scores is difficult. We avoid these difficulties by introducing an algorithm that guides the diffusion with a projected score. The projection pushes the image feature vector towards the feature vector centroid of the target class. The projected score and the feature vectors are learned by the same network. Specifically, the image feature vector is defined as the spatial averages of the channels activations in select layers of the network. Optimizing the projected score for denoising loss encourages image feature vectors of each class to cluster around their centroids. It also leads to the separations of the centroids. We show that these centroids provide a low-dimensional Euclidean embedding of the class conditional densities. We demonstrate that the algorithm can generate high quality and diverse samples from the conditioning class. Conditional generation can be performed using feature vectors interpolated between those of the training set, demonstrating out-of-distribution generalization.
翻译:分数扩散方法能够从样本中学习概率密度。通过深度神经网络估计噪声污染密度的分数,随后利用该分数迭代地将高斯白噪声密度传输至目标密度。针对条件密度已开发出多种变体,但对应分数的准确估计仍具挑战。我们通过引入一种利用投影分数引导扩散的算法来规避这些困难。该投影将图像特征向量推向目标类特征向量质心方向。投影分数与特征向量由同一网络学习。具体而言,图像特征向量被定义为网络选定层中通道激活值的空间平均值。通过优化投影分数以降低去噪损失,可促使每个类别的图像特征向量围绕其质心聚类,同时实现质心间的分离。我们证明这些质心为类别条件密度提供了低维欧几里得嵌入。实验表明,该算法能够从条件类别中生成高质量且多样化的样本。条件生成可使用训练集特征向量间的插值向量实现,展现出分布外泛化能力。