Machine Unlearning is rising as a new field, driven by the pressing necessity of ensuring privacy in modern artificial intelligence models. This technique primarily aims to eradicate any residual influence of a specific subset of data from the knowledge acquired by a neural model during its training. This work introduces a novel unlearning algorithm, denoted as Distance-based Unlearning via Centroid Kinematics (DUCK), which employs metric learning to guide the removal of samples matching the nearest incorrect centroid in the embedding space. Evaluation of the algorithm's performance is conducted across various benchmark datasets in two distinct scenarios, class removal, and homogeneous sampling removal, obtaining state-of-the-art performance. We also introduce a novel metric, called Adaptive Unlearning Score (AUS), encompassing not only the efficacy of the unlearning process in forgetting target data but also quantifying the performance loss relative to the original model. Additionally, we conducted a thorough investigation of the unlearning mechanism in DUCK, examining its impact on the organization of the feature space and employing explainable AI techniques for deeper insights.
翻译:摘要: 机器遗忘正成为一个新兴领域,其驱动力源于现代人工智能模型中确保隐私的迫切需求。该技术主要旨在消除特定数据子集对神经模型训练所获知识的残留影响。本文提出一种新颖的遗忘算法,即基于质心动力学的距离驱动遗忘算法(DUCK),该算法利用度量学习引导移除嵌入空间中与最近错误质心匹配的样本。通过在两种不同场景(类别移除与同质采样移除)下对多个基准数据集进行算法性能评估,我们取得了最先进的性能表现。此外,我们引入了一项新型指标——自适应遗忘评分(AUS),该指标不仅涵盖遗忘过程对目标数据的遗忘效力,还量化了相对原始模型的性能损失。最后,我们对DUCK的遗忘机制进行了深入探究,考察了其对特征空间组织的影响,并利用可解释人工智能技术获得了更深层次的洞见。