Diffusion Probabilistic Models (DPMs) have shown a powerful capacity of generating high-quality image samples. Recently, diffusion autoencoders (Diff-AE) have been proposed to explore DPMs for representation learning via autoencoding. Their key idea is to jointly train an encoder for discovering meaningful representations from images and a conditional DPM as the decoder for reconstructing images. Considering that training DPMs from scratch will take a long time and there have existed numerous pre-trained DPMs, we propose \textbf{P}re-trained \textbf{D}PM \textbf{A}uto\textbf{E}ncoding (\textbf{PDAE}), a general method to adapt existing pre-trained DPMs to the decoders for image reconstruction, with better training efficiency and performance than Diff-AE. Specifically, we find that the reason that pre-trained DPMs fail to reconstruct an image from its latent variables is due to the information loss of forward process, which causes a gap between their predicted posterior mean and the true one. From this perspective, the classifier-guided sampling method can be explained as computing an extra mean shift to fill the gap, reconstructing the lost class information in samples. These imply that the gap corresponds to the lost information of the image, and we can reconstruct the image by filling the gap. Drawing inspiration from this, we employ a trainable model to predict a mean shift according to encoded representation and train it to fill as much gap as possible, in this way, the encoder is forced to learn as much information as possible from images to help the filling. By reusing a part of network of pre-trained DPMs and redesigning the weighting scheme of diffusion loss, PDAE can learn meaningful representations from images efficiently. Extensive experiments demonstrate the effectiveness, efficiency and flexibility of PDAE.
翻译:扩散概率模型(DPMs)在生成高质量图像样本方面展现了强大的能力。最近,扩散自编码器(Diff-AE)被提出,旨在通过自编码方式探索DPMs在表示学习中的应用。其核心思想是联合训练一个编码器以从图像中发现有意义的表示,以及一个条件DPM作为解码器来重建图像。考虑到从头训练DPMs需要长时间,且已有大量预训练的DPMs存在,我们提出了**预训练DPM自编码(PDAE)**,这是一种通用方法,能够将现有预训练DPMs适配为图像重建的解码器,在训练效率和性能上均优于Diff-AE。具体而言,我们发现预训练DPMs无法从潜变量重建图像的原因在于前向过程的信息损失,这导致其预测的后验均值与真实均值之间存在差距。从这个角度看,分类器引导的采样方法可被解释为计算一个额外的均值偏移来填补该差距,从而重建样本中丢失的类别信息。这表明该差距对应于图像中丢失的信息,且我们可以通过填补差距来重建图像。受此启发,我们采用一个可训练模型根据编码表示预测均值偏移,并训练其尽可能填补差距;通过这种方式,编码器被迫从图像中学习尽可能多的信息以辅助填补。通过重用预训练DPMs的部分网络并重新设计扩散损失的加权方案,PDAE能够高效地从图像中学习有意义的表示。大量实验证明了PDAE的有效性、高效性和灵活性。