Pre-trained large text-to-image (T2I) models with an appropriate text prompt has attracted growing interests in customized images generation field. However, catastrophic forgetting issue make it hard to continually synthesize new user-provided styles while retaining the satisfying results amongst learned styles. In this paper, we propose MuseumMaker, a method that enables the synthesis of images by following a set of customized styles in a never-end manner, and gradually accumulate these creative artistic works as a Museum. When facing with a new customization style, we develop a style distillation loss module to transfer the style of the whole dataset into generation of images. It can minimize the learning biases caused by content of images, and address the catastrophic overfitting issue induced by few-shot images. To deal with catastrophic forgetting amongst past learned styles, we devise a dual regularization for shared-LoRA module to optimize the direction of model update, which could regularize the diffusion model from both weight and feature aspects, respectively. Meanwhile, a unique token embedding corresponding to this new style is learned by a task-wise token learning module, which could preserve historical knowledge from past styles with the limitation of LoRA parameter quantity. As any new user-provided style come, our MuseumMaker can capture the nuances of the new styles while maintaining the details of learned styles. Experimental results on diverse style datasets validate the effectiveness of our proposed MuseumMaker method, showcasing its robustness and versatility across various scenarios.
翻译:预训练大规模文本到图像(T2I)模型配合适当的文本提示,在定制化图像生成领域日益受到关注。然而,灾难性遗忘问题使得模型难以在持续合成用户提供的新风格的同时,保留已学习风格的满意生成效果。本文提出博物馆制造者(MuseumMaker)方法,该方法能以永不终止的方式遵循一组定制风格合成图像,并逐步将这些创意艺术作品积累为一座"博物馆"。面对新的定制风格时,我们开发了风格蒸馏损失模块,将整个数据集的风格迁移到图像生成过程中,从而最小化图像内容导致的学习偏差,并解决小样本引发的灾难性过拟合问题。为处理已学习风格间的灾难性遗忘,我们为共享LoRA模块设计了双重正则化机制,从权重和特征两个层面分别优化模型更新方向,进而约束扩散模型。同时,通过任务式令牌学习模块学习对应新风格的唯一令牌嵌入,在LoRA参数数量受限的情况下保存历史风格知识。当用户提供任意新风格时,我们的博物馆制造者既能捕捉新风格的细微差异,又能维持已学习风格的细节特征。在多种风格数据集上的实验结果验证了所提方法的有效性,展示了其在不同场景下的鲁棒性与通用性。