Despite the success of diffusion models (DMs), we still lack a thorough understanding of their latent space. While image editing with GANs builds upon latent space, DMs rely on editing the conditions such as text prompts. We present an unsupervised method to discover interpretable editing directions for the latent variables $\mathbf{x}_t \in \mathcal{X}$ of DMs. Our method adopts Riemannian geometry between $\mathcal{X}$ and the intermediate feature maps $\mathcal{H}$ of the U-Nets to provide a deep understanding over the geometrical structure of $\mathcal{X}$. The discovered semantic latent directions mostly yield disentangled attribute changes, and they are globally consistent across different samples. Furthermore, editing in earlier timesteps edits coarse attributes, while ones in later timesteps focus on high-frequency details. We define the curvedness of a line segment between samples to show that $\mathcal{X}$ is a curved manifold. Experiments on different baselines and datasets demonstrate the effectiveness of our method even on Stable Diffusion. Our source code will be publicly available for the future researchers.
翻译:尽管扩散模型(DMs)取得了巨大成功,我们仍对其潜在空间缺乏透彻理解。当基于生成对抗网络(GANs)的图像编辑建立在潜在空间上时,扩散模型依赖于对文本提示等条件的编辑。我们提出了一种无监督方法,用于发现扩散模型中潜在变量$\mathbf{x}_t \in \mathcal{X}$的可解释编辑方向。该方法采用$\mathcal{X}$与U-Net中间特征图$\mathcal{H}$之间的黎曼几何,以深入理解$\mathcal{X}$的几何结构。所发现的语义潜在方向大多能产生解耦的属性变化,并且在不同样本间保持全局一致性。此外,在较早时间步进行编辑可修改粗粒度属性,而在较晚时间步进行编辑则聚焦于高频细节。我们定义了样本间线段的曲率,以证明$\mathcal{X}$是一个弯曲流形。在不同基线和数据集上的实验证明了我们方法的有效性,即使在Stable Diffusion上也是如此。我们的源代码将向未来研究人员公开。