Machine unlearning empowers individuals with the `right to be forgotten' by removing their private or sensitive information encoded in machine learning models. However, it remains uncertain whether MU can be effectively applied to Multimodal Large Language Models (MLLMs), particularly in scenarios of forgetting the leaked visual data of concepts. To overcome the challenge, we propose an efficient method, Single Image Unlearning (SIU), to unlearn the visual recognition of a concept by fine-tuning a single associated image for few steps. SIU consists of two key aspects: (i) Constructing Multifaceted fine-tuning data. We introduce four targets, based on which we construct fine-tuning data for the concepts to be forgotten; (ii) Jointly training loss. To synchronously forget the visual recognition of concepts and preserve the utility of MLLMs, we fine-tune MLLMs through a novel Dual Masked KL-divergence Loss combined with Cross Entropy loss. Alongside our method, we establish MMUBench, a new benchmark for MU in MLLMs and introduce a collection of metrics for its evaluation. Experimental results on MMUBench show that SIU completely surpasses the performance of existing methods. Furthermore, we surprisingly find that SIU can avoid invasive membership inference attacks and jailbreak attacks. To the best of our knowledge, we are the first to explore MU in MLLMs. We will release the code and benchmark in the near future.
翻译:机器遗忘通过移除机器学习模型中编码的私有或敏感信息,赋予个体“被遗忘权”。然而,目前尚不确定机器遗忘能否有效应用于多模态大语言模型,尤其是在遗忘概念的可视化数据泄露场景中。为应对这一挑战,我们提出了一种高效方法——单图像遗忘(SIU),通过仅微调单张关联图像几步即可遗忘对概念的视觉识别。SIU包含两个关键方面:(i)构建多层面微调数据。我们引入四个目标,并基于这些目标为待遗忘概念构建微调数据;(ii)联合训练损失。为同步遗忘概念的视觉识别并保持多模态大语言模型的实用性,我们通过一种结合交叉熵损失的新型双重掩码KL散度损失来微调多模态大语言模型。伴随我们的方法,我们建立了MMUBench——一个用于多模态大语言模型中机器遗忘的新基准,并引入了一系列评估指标。在MMUBench上的实验结果表明,SIU完全超越了现有方法的性能。此外,我们惊奇地发现SIU能够规避侵入式成员推断攻击和越狱攻击。据我们所知,我们是首个探索多模态大语言模型中机器遗忘的研究。我们将在近期公开代码和基准。