Removing information from a machine learning model is a non-trivial task that requires to partially revert the training process. This task is unavoidable when sensitive data, such as credit card numbers or passwords, accidentally enter the model and need to be removed afterwards. Recently, different concepts for machine unlearning have been proposed to address this problem. While these approaches are effective in removing individual data points, they do not scale to scenarios where larger groups of features and labels need to be reverted. In this paper, we propose the first method for unlearning features and labels. Our approach builds on the concept of influence functions and realizes unlearning through closed-form updates of model parameters. It enables to adapt the influence of training data on a learning model retrospectively, thereby correcting data leaks and privacy issues. For learning models with strongly convex loss functions, our method provides certified unlearning with theoretical guarantees. For models with non-convex losses, we empirically show that unlearning features and labels is effective and significantly faster than other strategies.
翻译:从机器学习模型中消除信息是一项非平凡的任务,需要部分撤销训练过程。当信用卡号或密码等敏感数据意外进入模型并需事后移除时,这项任务不可避免。近年来,针对机器遗忘问题已提出多种概念性方案。虽然这些方法在移除单个数据点时有效,但在需要撤销较大规模特征组和标签组的情形下无法扩展。本文首次提出针对特征与标签的遗忘方法。我们的方法基于影响函数概念,通过模型参数的闭式更新实现遗忘。该方案能够回顾性地调整训练数据对学习模型的影响,从而纠正数据泄露和隐私问题。对于具有强凸损失函数的学习模型,我们的方法提供带理论保证的可认证遗忘。对于非凸损失模型,实验证明我们的特征与标签遗忘方法有效且显著快于其他策略。