The recent proliferation of large-scale text-to-image models has led to growing concerns that such models may be misused to generate harmful, misleading, and inappropriate content. Motivated by this issue, we derive a technique inspired by continual learning to selectively forget concepts in pretrained deep generative models. Our method, dubbed Selective Amnesia, enables controllable forgetting where a user can specify how a concept should be forgotten. Selective Amnesia can be applied to conditional variational likelihood models, which encompass a variety of popular deep generative frameworks, including variational autoencoders and large-scale text-to-image diffusion models. Experiments across different models demonstrate that our approach induces forgetting on a variety of concepts, from entire classes in standard datasets to celebrity and nudity prompts in text-to-image models. Our code is publicly available at https://github.com/clear-nus/selective-amnesia.
翻译:近年来大规模文本到图像模型的激增引发了日益增长的担忧,即此类模型可能被滥用于生成有害、误导及不当内容。基于这一问题,我们受持续学习启发提出了一种技术,可在预训练深度生成模型中选择性地遗忘特定概念。该方法名为"选择性遗忘",支持可控遗忘,允许用户指定概念应如何被遗忘。该技术适用于条件变分似然模型,这类模型涵盖多种主流深度生成框架,包括变分自编码器及大规模文本到图像扩散模型。跨不同模型的实验表明,该方法能够在多种概念上诱导遗忘——从标准数据集中的完整类别到文本到图像模型中的名人及裸体提示词。我们的代码已开源发布于 https://github.com/clear-nus/selective-amnesia。