Forget-Me-Not: Learning to Forget in Text-to-Image Diffusion Models

The unlearning problem of deep learning models, once primarily an academic concern, has become a prevalent issue in the industry. The significant advances in text-to-image generation techniques have prompted global discussions on privacy, copyright, and safety, as numerous unauthorized personal IDs, content, artistic creations, and potentially harmful materials have been learned by these models and later utilized to generate and distribute uncontrolled content. To address this challenge, we propose \textbf{Forget-Me-Not}, an efficient and low-cost solution designed to safely remove specified IDs, objects, or styles from a well-configured text-to-image model in as little as 30 seconds, without impairing its ability to generate other content. Alongside our method, we introduce the \textbf{Memorization Score (M-Score)} and \textbf{ConceptBench} to measure the models' capacity to generate general concepts, grouped into three primary categories: ID, object, and style. Using M-Score and ConceptBench, we demonstrate that Forget-Me-Not can effectively eliminate targeted concepts while maintaining the model's performance on other concepts. Furthermore, Forget-Me-Not offers two practical extensions: a) removal of potentially harmful or NSFW content, and b) enhancement of model accuracy, inclusion and diversity through \textbf{concept correction and disentanglement}. It can also be adapted as a lightweight model patch for Stable Diffusion, allowing for concept manipulation and convenient distribution. To encourage future research in this critical area and promote the development of safe and inclusive generative models, we will open-source our code and ConceptBench at \href{https://github.com/SHI-Labs/Forget-Me-Not}{https://github.com/SHI-Labs/Forget-Me-Not}.

翻译：深度学习模型的“遗忘”问题，曾主要停留在学术探讨层面，如今已成为行业中的普遍挑战。文本到图像生成技术的重大进展引发了全球范围内关于隐私、版权和安全的讨论，因为大量未经授权的个人身份、内容、艺术创作及潜在有害材料被这些模型学习，并随后用于生成和传播不受控制的内容。为应对这一难题，我们提出了\textbf{Forget-Me-Not}，这是一种高效且低成本的解决方案，能够在不损害模型生成其他内容能力的前提下，在短短30秒内安全地从配置完善的文本到图像模型中移除指定身份、物体或风格。伴随该方法，我们还引入了\textbf{记忆分数（M-Score）}和\textbf{ConceptBench}，用以衡量模型生成通用概念的能力，这些概念被划分为三个主要类别：身份、物体和风格。利用M-Score和ConceptBench，我们证明Forget-Me-Not能有效消除目标概念，同时保持模型在其他概念上的性能。此外，Forget-Me-Not还提供两种实用扩展：a) 移除潜在有害或NSFW内容，b) 通过\textbf{概念校正与解耦}提升模型的准确性、包容性和多样性。它还可以作为轻量级模型补丁应用于Stable Diffusion，实现概念操控和便捷分发。为鼓励这一关键领域的未来研究，并促进安全、包容的生成式模型发展，我们将在\href{https://github.com/SHI-Labs/Forget-Me-Not}{https://github.com/SHI-Labs/Forget-Me-Not}开源我们的代码和ConceptBench。