Denoising Diffusion Models (DDMs) have emerged as a strong competitor to Generative Adversarial Networks (GANs). However, despite their widespread use in image synthesis and editing applications, their latent space is still not as well understood. Recently, a semantic latent space for DDMs, coined `$h$-space', was shown to facilitate semantic image editing in a way reminiscent of GANs. The $h$-space is comprised of the bottleneck activations in the DDM's denoiser across all timesteps of the diffusion process. In this paper, we explore the properties of h-space and propose several novel methods for finding meaningful semantic directions within it. We start by studying unsupervised methods for revealing interpretable semantic directions in pretrained DDMs. Specifically, we show that global latent directions emerge as the principal components in the latent space. Additionally, we provide a novel method for discovering image-specific semantic directions by spectral analysis of the Jacobian of the denoiser w.r.t. the latent code. Next, we extend the analysis by finding directions in a supervised fashion in unconditional DDMs. We demonstrate how such directions can be found by relying on either a labeled data set of real images or by annotating generated samples with a domain-specific attribute classifier. We further show how to semantically disentangle the found direction by simple linear projection. Our approaches are applicable without requiring any architectural modifications, text-based guidance, CLIP-based optimization, or model fine-tuning.
翻译:去噪扩散模型(DDMs)已成为生成对抗网络(GANs)的有力竞争者。然而,尽管它们在图像合成和编辑应用中广泛使用,但其潜空间仍未被充分理解。近期,DDM的一个语义潜空间(即“h-空间”)被证明能够以类似GAN的方式辅助语义图像编辑。h-空间由DDM去噪器中所有扩散时间步的瓶颈激活值组成。本文探索了h-空间的特性,并提出了几种在该空间中寻找有意义的语义方向的新方法。首先,我们研究了在预训练DDM中揭示可解释语义方向的无监督方法。具体而言,我们展示了全局潜在方向作为潜空间中的主成分涌现。此外,我们提出了一种通过去噪器关于潜码的雅可比矩阵谱分析来发现图像特定语义方向的新方法。接着,我们将分析扩展至在无条件DDM中以监督方式寻找方向。我们展示了如何依赖真实图像的标注数据集或利用领域特定属性分类器对生成样本进行标注来发现此类方向。进一步,我们展示了如何通过简单的线性投影在语义上解耦所发现的方向。我们的方法无需任何架构修改、基于文本的引导、CLIP优化或模型微调即可适用。