Machine unlearning aims to eliminate the influence of a subset of training samples (i.e., unlearning samples) from a trained model. Effectively and efficiently removing the unlearning samples without negatively impacting the overall model performance is still challenging. In this paper, we propose a contrastive unlearning framework, leveraging the concept of representation learning for more effective unlearning. It removes the influence of unlearning samples by contrasting their embeddings against the remaining samples so that they are pushed away from their original classes and pulled toward other classes. By directly optimizing the representation space, it effectively removes the influence of unlearning samples while maintaining the representations learned from the remaining samples. Experiments on a variety of datasets and models on both class unlearning and sample unlearning showed that contrastive unlearning achieves the best unlearning effects and efficiency with the lowest performance loss compared with the state-of-the-art algorithms.
翻译:机器遗忘旨在从已训练模型中消除部分训练样本(即遗忘样本)的影响。如何在不对整体模型性能产生负面影响的前提下,高效且有效地移除遗忘样本仍具挑战性。本文提出一种对比式遗忘学习框架,利用表征学习的概念实现更有效的遗忘。该框架通过对比遗忘样本与剩余样本的嵌入表示,将遗忘样本推离其原始类别并拉向其他类别,从而消除其影响。通过直接优化表征空间,该方法在有效移除遗忘样本影响的同时,保留从剩余样本中习得的表征。在多个数据集和模型上的类别遗忘与样本遗忘实验表明,与现有最优算法相比,对比式遗忘学习在实现最佳遗忘效果与效率的同时,性能损失最低。