Editing Implicit Assumptions in Text-to-Image Diffusion Models

Text-to-image diffusion models often make implicit assumptions about the world when generating images. While some assumptions are useful (e.g., the sky is blue), they can also be outdated, incorrect, or reflective of social biases present in the training data. Thus, there is a need to control these assumptions without requiring explicit user input or costly re-training. In this work, we aim to edit a given implicit assumption in a pre-trained diffusion model. Our Text-to-Image Model Editing method, TIME for short, receives a pair of inputs: a "source" under-specified prompt for which the model makes an implicit assumption (e.g., "a pack of roses"), and a "destination" prompt that describes the same setting, but with a specified desired attribute (e.g., "a pack of blue roses"). TIME then updates the model's cross-attention layers, as these layers assign visual meaning to textual tokens. We edit the projection matrices in these layers such that the source prompt is projected close to the destination prompt. Our method is highly efficient, as it modifies a mere 2.2% of the model's parameters in under one second. To evaluate model editing approaches, we introduce TIMED (TIME Dataset), containing 147 source and destination prompt pairs from various domains. Our experiments (using Stable Diffusion) show that TIME is successful in model editing, generalizes well for related prompts unseen during editing, and imposes minimal effect on unrelated generations.

翻译：文本到图像扩散模型在生成图像时常常对现实世界做出隐式假设。虽然某些假设是有益的（例如“天空是蓝色的”），但它们也可能过时、错误，或反映训练数据中存在的社会偏见。因此，需要在不依赖显式用户输入或高成本重新训练的情况下控制这些假设。在本工作中，我们旨在编辑预训练扩散模型中的特定隐式假设。我们的文本到图像模型编辑方法——简称TIME——接收一对输入：一个“源”低限定提示，模型对其做出隐式假设（例如“一束玫瑰”），以及一个“目标”提示，描述相同场景但指定了期望属性（例如“一束蓝色玫瑰”）。TIME随后更新模型的交叉注意力层，因为这些层将文本标记赋予视觉含义。我们编辑这些层中的投影矩阵，使源提示的投影接近目标提示的投影。本方法效率极高，仅修改模型参数中的2.2%，且耗时不足一秒。为评估模型编辑方法，我们引入TIMED（TIME数据集），包含来自不同领域的147对源-目标提示。我们的实验（使用Stable Diffusion）表明，TIME在模型编辑中表现成功，对于编辑时未见的相关提示具有良好的泛化能力，并对无关生成的影响极小。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/