During pre-training, the Text-to-Image (T2I) diffusion models encode factual knowledge into their parameters. These parameterized facts enable realistic image generation, but they may become obsolete over time, thereby misrepresenting the current state of the world. Knowledge editing techniques aim to update model knowledge in a targeted way. However, facing the dual challenges posed by inadequate editing datasets and unreliable evaluation criterion, the development of T2I knowledge editing encounter difficulties in effectively generalizing injected knowledge. In this work, we design a T2I knowledge editing framework by comprehensively spanning on three phases: First, we curate a dataset \textbf{CAKE}, comprising paraphrase and multi-object test, to enable more fine-grained assessment on knowledge generalization. Second, we propose a novel criterion, \textbf{adaptive CLIP threshold}, to effectively filter out false successful images under the current criterion and achieve reliable editing evaluation. Finally, we introduce \textbf{MPE}, a simple but effective approach for T2I knowledge editing. Instead of tuning parameters, MPE precisely recognizes and edits the outdated part of the conditioning text-prompt to accommodate the up-to-date knowledge. A straightforward implementation of MPE (Based on in-context learning) exhibits better overall performance than previous model editors. We hope these efforts can further promote faithful evaluation of T2I knowledge editing methods.
翻译:在预训练过程中,文本到图像(T2I)扩散模型将事实知识编码到其参数中。这些参数化的事实能够实现逼真的图像生成,但可能随时间推移而变得过时,从而错误地反映世界的当前状态。知识编辑技术旨在以目标明确的方式更新模型知识。然而,面对编辑数据集不足和评估准则不可靠的双重挑战,T2I知识编辑的发展在有效泛化注入知识方面遇到困难。在本工作中,我们设计了一个T2I知识编辑框架,全面涵盖三个阶段:首先,我们构建了数据集 **CAKE**,包含释义和多对象测试,以实现对知识泛化更细粒度的评估。其次,我们提出了一种新颖的准则——**自适应CLIP阈值**,以有效过滤当前准则下的虚假成功图像,实现可靠的编辑评估。最后,我们引入了 **MPE**,一种简单而有效的T2I知识编辑方法。MPE不调整模型参数,而是精确识别并编辑条件文本提示中过时的部分,以适应最新的知识。MPE的一个基于上下文学习的简单实现,展现出比先前模型编辑方法更优的整体性能。我们希望这些工作能进一步推动对T2I知识编辑方法的忠实评估。