A prominent family of methods for learning data distributions relies on density ratio estimation (DRE), where a model is trained to $\textit{classify}$ between data samples and samples from some reference distribution. DRE-based models can directly output the likelihood for any given input, a highly desired property that is lacking in most generative techniques. Nevertheless, to date, DRE methods have failed in accurately capturing the distributions of complex high-dimensional data, like images, and have thus been drawing reduced research attention in recent years. In this work we present $\textit{classification diffusion models}$ (CDMs), a DRE-based generative method that adopts the formalism of denoising diffusion models (DDMs) while making use of a classifier that predicts the level of noise added to a clean signal. Our method is based on an analytical connection that we derive between the MSE-optimal denoiser for removing white Gaussian noise and the cross-entropy-optimal classifier for predicting the noise level. Our method is the first DRE-based technique that can successfully generate images beyond the MNIST dataset. Furthermore, it can output the likelihood of any input in a single forward pass, achieving state-of-the-art negative log likelihood (NLL) among methods with this property. Code is available on the project's webpage in https://shaharYadin.github.io/CDM/ .
翻译:学习数据分布的一类重要方法依赖于密度比估计(DRE),该方法通过训练模型来对数据样本与参考分布样本进行$\textit{分类}$。基于DRE的模型能够直接输出任意给定输入的似然值,这是大多数生成技术所缺乏且备受期待的特性。然而,迄今为止,DRE方法在准确捕捉复杂高维数据(如图像)的分布方面始终未能成功,导致近年来相关研究关注度逐渐降低。本文提出$\textit{分类扩散模型}$(CDMs),这是一种基于DRE的生成方法,它采用去噪扩散模型(DDMs)的形式化框架,同时利用分类器预测叠加在干净信号上的噪声水平。我们的方法基于一项解析关联的推导:去除高斯白噪声的均方误差最优去噪器与预测噪声水平的交叉熵最优分类器之间存在理论联系。本方法是首个能够成功生成超越MNIST数据集图像的DRE技术。此外,它可通过单次前向传播输出任意输入的似然值,在此类方法中实现了最先进的负对数似然(NLL)性能。代码发布于项目网页https://shaharYadin.github.io/CDM/。