Distribution-Level Feature Distancing for Machine Unlearning: Towards a Better Trade-off Between Model Utility and Forgetting

With the explosive growth of deep learning applications and increasing privacy concerns, the right to be forgotten has become a critical requirement in various AI industries. For example, given a facial recognition system, some individuals may wish to remove their personal data that might have been used in the training phase. Unfortunately, deep neural networks sometimes unexpectedly leak personal identities, making this removal challenging. While recent machine unlearning algorithms aim to enable models to forget specific data, we identify an unintended utility drop-correlation collapse-in which the essential correlations between image features and true labels weaken during the forgetting process. To address this challenge, we propose Distribution-Level Feature Distancing (DLFD), a novel method that efficiently forgets instances while preserving task-relevant feature correlations. Our method synthesizes data samples by optimizing the feature distribution to be distinctly different from that of forget samples, achieving effective results within a single training epoch. Through extensive experiments on facial recognition datasets, we demonstrate that our approach significantly outperforms state-of-the-art machine unlearning methods in both forgetting performance and model utility preservation.

翻译：随着深度学习应用的爆炸式增长和隐私关注度的日益提升，被遗忘权已成为各人工智能行业的关键需求。例如，在面部识别系统中，部分个体可能希望删除其在训练阶段可能被使用的个人数据。然而，深度神经网络有时会意外泄露个人身份信息，使得此类数据移除面临挑战。尽管现有机器遗忘算法致力于使模型能够遗忘特定数据，我们发现其中存在非预期的效用下降——相关性崩塌现象，即在遗忘过程中图像特征与真实标签间的本质关联性被削弱。为应对这一挑战，本文提出分布级特征距离化方法，该创新方法能在高效遗忘实例的同时保持任务相关的特征关联性。本方法通过优化特征分布使其与待遗忘样本的特征分布显著区分，从而在单个训练周期内实现有效遗忘。通过在面部识别数据集上的大量实验，我们证明该方法在遗忘性能与模型效用保持方面均显著优于当前最先进的机器遗忘方法。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/