Text-to-image diffusion models have demonstrated an unparalleled ability to generate high-quality, diverse images from a textual concept (e.g., "a doctor", "love"). However, the internal process of mapping text to a rich visual representation remains an enigma. In this work, we tackle the challenge of understanding concept representations in text-to-image models by decomposing an input text prompt into a small set of interpretable elements. This is achieved by learning a pseudo-token that is a sparse weighted combination of tokens from the model's vocabulary, with the objective of reconstructing the images generated for the given concept. Applied over the state-of-the-art Stable Diffusion model, this decomposition reveals non-trivial and surprising structures in the representations of concepts. For example, we find that some concepts such as "a president" or "a composer" are dominated by specific instances (e.g., "Obama", "Biden") and their interpolations. Other concepts, such as "happiness" combine associated terms that can be concrete ("family", "laughter") or abstract ("friendship", "emotion"). In addition to peering into the inner workings of Stable Diffusion, our method also enables applications such as single-image decomposition to tokens, bias detection and mitigation, and semantic image manipulation. Our code will be available at: https://hila-chefer.github.io/Conceptor/
翻译:文本到图像扩散模型在从文本概念(例如,“医生”、“爱”)生成高质量、多样化的图像方面展现了无与伦比的能力。然而,将文本映射到丰富视觉表征的内部过程仍然是一个谜。在这项工作中,我们通过将输入文本提示分解为一小组可解释的单元,来应对理解文本到图像模型中概念表征的挑战。这是通过学习一个伪标记来实现的,该伪标记是模型词汇表中标记的稀疏加权组合,其目标是重建为给定概念生成的图像。在先进的稳定扩散模型上应用后,这种分解揭示了概念表征中非平凡且令人惊讶的结构。例如,我们发现“总统”或“作曲家”等概念被特定实例(如“奥巴马”、“拜登”)及其插值所主导。其他概念,如“幸福”,则组合了相关的术语,这些术语可以是具体的(“家庭”、“笑声”)或抽象的(“友谊”、“情感”)。除了深入观察稳定扩散的内部运作外,我们的方法还支持诸如单图像分解为标记、偏差检测与缓解以及语义图像操作等应用。我们的代码将在以下链接提供:https://hila-chefer.github.io/Conceptor/