Although diffusion models have achieved remarkable success in the field of image generation, their latent space remains under-explored. Current methods for identifying semantics within latent space often rely on external supervision, such as textual information and segmentation masks. In this paper, we propose a method to identify semantic attributes in the latent space of pre-trained diffusion models without any further training. By projecting the Jacobian of the targeted semantic region into a low-dimensional subspace which is orthogonal to the non-masked regions, our approach facilitates precise semantic discovery and control over local masked areas, eliminating the need for annotations. We conducted extensive experiments across multiple datasets and various architectures of diffusion models, achieving state-of-the-art performance. In particular, for some specific face attributes, the performance of our proposed method even surpasses that of supervised approaches, demonstrating its superior ability in editing local image properties.
翻译:尽管扩散模型在图像生成领域取得了显著成功,但其潜在空间仍未被充分探索。当前识别潜在空间中语义的方法通常依赖于外部监督,例如文本信息和分割掩码。本文提出一种方法,无需任何额外训练即可在预训练扩散模型的潜在空间中识别语义属性。通过将目标语义区域的雅可比矩阵投影到与非掩码区域正交的低维子空间中,我们的方法促进了精确的语义发现和对局部掩码区域的控制,无需任何标注。我们在多个数据集和多种扩散模型架构上进行了广泛实验,取得了最先进的性能。特别地,对于某些特定的人脸属性,我们提出的方法性能甚至超越了监督方法,展现了其在编辑局部图像属性方面的卓越能力。