The Hidden Language of Diffusion Models

Text-to-image diffusion models have demonstrated an unparalleled ability to generate high-quality, diverse images from a textual concept (e.g., "a doctor", "love"). However, the internal process of mapping text to a rich visual representation remains an enigma. In this work, we tackle the challenge of understanding concept representations in text-to-image models by decomposing an input text prompt into a small set of interpretable elements. This is achieved by learning a pseudo-token that is a sparse weighted combination of tokens from the model's vocabulary, with the objective of reconstructing the images generated for the given concept. Applied over the state-of-the-art Stable Diffusion model, this decomposition reveals non-trivial and surprising structures in the representations of concepts. For example, we find that some concepts such as "a president" or "a composer" are dominated by specific instances (e.g., "Obama", "Biden") and their interpolations. Other concepts, such as "happiness" combine associated terms that can be concrete ("family", "laughter") or abstract ("friendship", "emotion"). In addition to peering into the inner workings of Stable Diffusion, our method also enables applications such as single-image decomposition to tokens, bias detection and mitigation, and semantic image manipulation. Our code will be available at: https://hila-chefer.github.io/Conceptor/

翻译：文本到图像扩散模型在从文本概念（例如，“医生”、“爱”）生成高质量、多样化的图像方面展现了无与伦比的能力。然而，将文本映射到丰富视觉表征的内部过程仍然是一个谜。在这项工作中，我们通过将输入文本提示分解为一小组可解释的单元，来应对理解文本到图像模型中概念表征的挑战。这是通过学习一个伪标记来实现的，该伪标记是模型词汇表中标记的稀疏加权组合，其目标是重建为给定概念生成的图像。在先进的稳定扩散模型上应用后，这种分解揭示了概念表征中非平凡且令人惊讶的结构。例如，我们发现“总统”或“作曲家”等概念被特定实例（如“奥巴马”、“拜登”）及其插值所主导。其他概念，如“幸福”，则组合了相关的术语，这些术语可以是具体的（“家庭”、“笑声”）或抽象的（“友谊”、“情感”）。除了深入观察稳定扩散的内部运作外，我们的方法还支持诸如单图像分解为标记、偏差检测与缓解以及语义图像操作等应用。我们的代码将在以下链接提供：https://hila-chefer.github.io/Conceptor/

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/