Machine learning models are vulnerable to adversarial attacks, including attacks that leak information about the model's training data. There has recently been an increase in interest about how to best address privacy concerns, especially in the presence of data-removal requests. Machine unlearning algorithms aim to efficiently update trained models to comply with data deletion requests while maintaining performance and without having to resort to retraining the model from scratch, a costly endeavor. Several algorithms in the machine unlearning literature demonstrate some level of privacy gains, but they are often evaluated only on rudimentary membership inference attacks, which do not represent realistic threats. In this paper we describe and propose alternative evaluation methods for three key shortcomings in the current evaluation of unlearning algorithms. We show the utility of our alternative evaluations via a series of experiments of state-of-the-art unlearning algorithms on different computer vision datasets, presenting a more detailed picture of the state of the field.
翻译:机器学习模型容易受到对抗性攻击,包括泄露模型训练数据信息的攻击。近期,如何最佳地解决隐私问题(尤其是在面临数据删除请求时)引起了越来越多的关注。机器遗忘算法旨在高效更新已训练模型,以符合数据删除请求,同时保持性能,且无需从头开始重新训练模型(这是一项成本高昂的工作)。机器遗忘文献中的多种算法展示了一定程度的隐私增益,但它们通常仅在基本的成员推理攻击上进行评估,而这些攻击并不能代表真实的威胁。本文针对当前遗忘算法评估中的三个关键缺陷,描述并提出了替代性评估方法。我们通过对不同计算机视觉数据集上最先进的遗忘算法进行一系列实验,展示了替代性评估方法的实用性,从而更详细地描绘了该领域的现状。