Lifelong few-shot customization for text-to-image diffusion aims to continually generalize existing models for new tasks with minimal data while preserving old knowledge. Current customization diffusion models excel in few-shot tasks but struggle with catastrophic forgetting problems in lifelong generations. In this study, we identify and categorize the catastrophic forgetting problems into two folds: relevant concepts forgetting and previous concepts forgetting. To address these challenges, we first devise a data-free knowledge distillation strategy to tackle relevant concepts forgetting. Unlike existing methods that rely on additional real data or offline replay of original concept data, our approach enables on-the-fly knowledge distillation to retain the previous concepts while learning new ones, without accessing any previous data. Second, we develop an In-Context Generation (ICGen) paradigm that allows the diffusion model to be conditioned upon the input vision context, which facilitates the few-shot generation and mitigates the issue of previous concepts forgetting. Extensive experiments show that the proposed Lifelong Few-Shot Diffusion (LFS-Diffusion) method can produce high-quality and accurate images while maintaining previously learned knowledge.
翻译:文本到图像扩散模型的终身少样本定制化旨在以最少的数据持续泛化现有模型以应对新任务,同时保留旧有知识。当前的定制化扩散模型在少样本任务上表现出色,但在终身生成过程中难以克服灾难性遗忘问题。本研究将灾难性遗忘问题识别并归类为两个方面:相关概念遗忘与先前概念遗忘。为应对这些挑战,我们首先设计了一种无需数据的知识蒸馏策略以解决相关概念遗忘问题。与依赖额外真实数据或原始概念数据离线回放的现有方法不同,我们的方法能够实现即时知识蒸馏,在学习新概念的同时保留先前概念,且无需访问任何历史数据。其次,我们开发了一种上下文生成范式,使扩散模型能够以输入视觉上下文为条件进行生成,这既促进了少样本生成,也缓解了先前概念遗忘问题。大量实验表明,所提出的终身少样本扩散方法能够生成高质量且准确的图像,同时保持先前习得的知识。