Machine unlearning has emerged as a prominent and challenging area of interest, driven in large part by the rising regulatory demands for industries to delete user data upon request and the heightened awareness of privacy. Existing approaches either retrain models from scratch or use several finetuning steps for every deletion request, often constrained by computational resource limitations and restricted access to the original training data. In this work, we introduce a novel class unlearning algorithm designed to strategically eliminate an entire class or a group of classes from the learned model. To that end, our algorithm first estimates the Retain Space and the Forget Space, representing the feature or activation spaces for samples from classes to be retained and unlearned, respectively. To obtain these spaces, we propose a novel singular value decomposition-based technique that requires layer wise collection of network activations from a few forward passes through the network. We then compute the shared information between these spaces and remove it from the forget space to isolate class-discriminatory feature space for unlearning. Finally, we project the model weights in the orthogonal direction of the class-discriminatory space to obtain the unlearned model. We demonstrate our algorithm's efficacy on ImageNet using a Vision Transformer with only $\sim$1.5% drop in retain accuracy compared to the original model while maintaining under 1% accuracy on the unlearned class samples. Further, our algorithm consistently performs well when subject to Membership Inference Attacks showing 7.8% improvement on average across a variety of image classification datasets and network architectures, as compared to other baselines while being $\sim$6x more computationally efficient.
翻译:机器遗忘已成为一个突出且具有挑战性的研究领域,这主要源于行业监管日益要求根据用户请求删除用户数据,以及公众对隐私保护意识的增强。现有方法要么从零开始重新训练模型,要么针对每次删除请求使用若干微调步骤,但往往受限于计算资源不足和无法访问原始训练数据。在本工作中,我们提出了一种新颖的类别遗忘算法,旨在从学到的模型中策略性地消除整个类别或一组类别。为此,我们的算法首先估计保留空间和遗忘空间,分别表示待保留和待遗忘类别样本的特征或激活空间。为了获得这些空间,我们提出了一种基于奇异值分解的新技术,该技术只需通过网络进行少量前向传播即可逐层收集网络激活值。然后,我们计算这些空间之间的共享信息,并将其从遗忘空间中移除,从而隔离出用于遗忘的类别判别特征空间。最后,我们将模型权重投影到类别判别空间的正交方向上,得到遗忘后的模型。我们使用Vision Transformer在ImageNet上证明了该算法的有效性,与原始模型相比,保留准确率仅下降了约1.5%,同时遗忘类别样本上的准确率保持在1%以下。此外,在成员推理攻击下,我们的算法始终表现良好,在各种图像分类数据集和网络架构上,与其他基线相比平均提升了7.8%,同时计算效率提高了约6倍。