Text-to-image generative models can produce diverse high-quality images of concepts with a text prompt, which have demonstrated excellent ability in image generation, image translation, etc. We in this work study the problem of synthesizing instantiations of a use's own concepts in a never-ending manner, i.e., create your world, where the new concepts from user are quickly learned with a few examples. To achieve this goal, we propose a Lifelong text-to-image Diffusion Model (L2DM), which intends to overcome knowledge "catastrophic forgetting" for the past encountered concepts, and semantic "catastrophic neglecting" for one or more concepts in the text prompt. In respect of knowledge "catastrophic forgetting", our L2DM framework devises a task-aware memory enhancement module and a elastic-concept distillation module, which could respectively safeguard the knowledge of both prior concepts and each past personalized concept. When generating images with a user text prompt, the solution to semantic "catastrophic neglecting" is that a concept attention artist module can alleviate the semantic neglecting from concept aspect, and an orthogonal attention module can reduce the semantic binding from attribute aspect. To the end, our model can generate more faithful image across a range of continual text prompts in terms of both qualitative and quantitative metrics, when comparing with the related state-of-the-art models. The code will be released at https://wenqiliang.github.io/.
翻译:文本到图像生成模型能够通过文本提示生成多样化、高质量的概念图像,在图像生成、图像翻译等领域展现出卓越能力。本文研究如何在永无止境的方式中合成用户自身概念的实例化,即“创造你的世界”,其中用户的新概念可通过少量样本快速学习。为实现这一目标,我们提出了一种终身文本到图像扩散模型(L2DM),旨在克服过往遭遇概念的“灾难性遗忘”以及文本提示中一个或多个概念的语义“灾难性忽视”。针对知识“灾难性遗忘”,我们的L2DM框架设计了一种任务感知记忆增强模块和弹性概念蒸馏模块,分别保护先前概念和每个过往个性化概念的知识。在使用用户文本提示生成图像时,语义“灾难性忽视”的解决方案包括:概念关注艺术家模块可从概念角度缓解语义忽视,正交关注模块可从属性角度减少语义绑定。最终,与相关最先进模型相比,我们的模型在一系列连续文本提示下,在定性和定量指标上均能生成更保真的图像。代码将发布于https://wenqiliang.github.io/。