Machine Unlearning on Pre-trained Models by Residual Feature Alignment Using LoRA

Machine unlearning is new emerged technology that removes a subset of the training data from a trained model without affecting the model performance on the remaining data. This topic is becoming increasingly important in protecting user privacy and eliminating harmful or outdated data. The key challenge lies in effectively and efficiently unlearning specific information without compromising the model's utility on the retained data. For the pre-trained models, fine-tuning is an important way to achieve the unlearning target. Previous work typically fine-tuned the entire model's parameters, which incurs significant computation costs. In addition, the fine-tuning process may cause shifts in the intermediate layer features, affecting the model's overall utility. In this work, we propose a novel and efficient machine unlearning method on pre-trained models. We term the method as Residual Feature Alignment Unlearning. Specifically, we leverage LoRA (Low-Rank Adaptation) to decompose the model's intermediate features into pre-trained features and residual features. By adjusting the residual features, we align the unlearned model with the pre-trained model at the intermediate feature level to achieve both unlearning and remaining targets. The method aims to learn the zero residuals on the retained set and shifted residuals on the unlearning set. Extensive experiments on numerous datasets validate the effectiveness of our approach.

翻译：机器遗忘是一种新兴技术，它能够从已训练模型中移除部分训练数据，同时不影响模型在剩余数据上的性能。该技术在保护用户隐私、消除有害或过时数据方面正变得日益重要。其核心挑战在于如何高效且有效地遗忘特定信息，同时不损害模型在保留数据上的效用。对于预训练模型，微调是实现遗忘目标的重要途径。先前研究通常对整个模型参数进行微调，这会产生巨大的计算成本。此外，微调过程可能导致中间层特征发生偏移，从而影响模型的整体性能。本文提出了一种新颖高效的预训练模型机器遗忘方法，称为残差特征对齐遗忘法。具体而言，我们利用LoRA（低秩自适应）将模型的中间特征分解为预训练特征和残差特征。通过调整残差特征，我们在中间特征层面将遗忘模型与预训练模型对齐，以同时实现遗忘与保留目标。该方法旨在学习保留集上的零残差和遗忘集上的偏移残差。在多个数据集上的大量实验验证了我们方法的有效性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日