Diffusion models may be viewed as hierarchical variational autoencoders (VAEs) with two improvements: parameter sharing for the conditional distributions in the generative process and efficient computation of the loss as independent terms over the hierarchy. We consider two changes to the diffusion model that retain these advantages while adding flexibility to the model. Firstly, we introduce a data- and depth-dependent mean function in the diffusion process, which leads to a modified diffusion loss. Our proposed framework, DiffEnc, achieves a statistically significant improvement in likelihood on CIFAR-10. Secondly, we let the ratio of the noise variance of the reverse encoder process and the generative process be a free weight parameter rather than being fixed to 1. This leads to theoretical insights: For a finite depth hierarchy, the evidence lower bound (ELBO) can be used as an objective for a weighted diffusion loss approach and for optimizing the noise schedule specifically for inference. For the infinite-depth hierarchy, on the other hand, the weight parameter has to be 1 to have a well-defined ELBO.
翻译:扩散模型可视为层级变分自编码器(VAE)的两种改进:生成过程中条件分布的参数共享机制,以及将损失函数分解为层级独立项的高效计算方式。本文对扩散模型进行两项改进,在保留上述优势的同时增强模型灵活性。首先,我们在扩散过程中引入数据相关且层级依赖的均值函数,从而导出修正的扩散损失函数。所提出的DiffEnc框架在CIFAR-10数据集上实现了统计显著的似然提升。其次,我们将逆向编码过程与生成过程噪声方差之比设为自由权重参数,而非固定为1。这一改进带来理论启示:对于有限深度层级,证据下界(ELBO)可作为加权扩散损失方法的目标函数,并用于优化专用于推理的噪声调度;而对于无限深度层级,为获得良定义的ELBO,该权重参数必须设为1。