Diffusion models may be viewed as hierarchical variational autoencoders (VAEs) with two improvements: parameter sharing for the conditional distributions in the generative process and efficient computation of the loss as independent terms over the hierarchy. We consider two changes to the diffusion model that retain these advantages while adding flexibility to the model. Firstly, we introduce a data- and depth-dependent mean function in the diffusion process, which leads to a modified diffusion loss. Our proposed framework, DiffEnc, achieves state-of-the-art likelihood on CIFAR-10. Secondly, we let the ratio of the noise variance of the reverse encoder process and the generative process be a free weight parameter rather than being fixed to 1. This leads to theoretical insights: For a finite depth hierarchy, the evidence lower bound (ELBO) can be used as an objective for a weighted diffusion loss approach and for optimizing the noise schedule specifically for inference. For the infinite-depth hierarchy, on the other hand, the weight parameter has to be 1 to have a well-defined ELBO.
翻译:扩散模型可视为层级变分自编码器(VAEs)的两项改进:生成过程中条件分布的参数共享,以及将损失函数分解为层级间独立项的高效计算。我们提出对扩散模型的两项改进,在保持原有优势的同时增强模型灵活性。首先,我们在扩散过程中引入数据相关且深度依赖的均值函数,从而导出修正的扩散损失。本文提出的DiffEnc框架在CIFAR-10数据集上实现了最先进的对数似然性能。其次,我们将逆向编码过程与生成过程的噪声方差比设为自由权重参数(而非固定为1),这带来理论启示:对于有限深度层级,证据下界(ELBO)可作为加权扩散损失方法的目标函数,并用于专门优化推理过程的噪声调度;而对于无限深度层级,为获得良定义的ELBO,该权重参数必须设为1。