Machine Learning models face increased concerns regarding the storage of personal user data and adverse impacts of corrupted data like backdoors or systematic bias. Machine Unlearning can address these by allowing post-hoc deletion of affected training data from a learned model. Achieving this task exactly is computationally expensive; consequently, recent works have proposed inexact unlearning algorithms to solve this approximately as well as evaluation methods to test the effectiveness of these algorithms. In this work, we first outline some necessary criteria for evaluation methods and show no existing evaluation satisfies them all. Then, we design a stronger black-box evaluation method called the Interclass Confusion (IC) test which adversarially manipulates data during training to detect the insufficiency of unlearning procedures. We also propose two analytically motivated baseline methods~(EU-k and CF-k) which outperform several popular inexact unlearning methods. Overall, we demonstrate how adversarial evaluation strategies can help in analyzing various unlearning phenomena which can guide the development of stronger unlearning algorithms.
翻译:机器学习模型在存储用户个人数据以及处理被污染数据(如后门攻击或系统性偏差)带来的负面影响方面面临日益增长的担忧。机器遗忘技术通过允许模型在训练后删除受影响训练数据来应对这些问题。精确实现该任务计算成本高昂;因此,近期研究提出非精确遗忘算法进行近似求解,并设计了评估方法验证这些算法的有效性。本文首先梳理了评估方法所需的关键准则,并指出现有方法无法同时满足所有准则。随后,我们设计了一种更强的黑盒评估方法——类间混淆测试,通过对抗性操纵训练数据来检测遗忘过程的不足。此外,我们提出了两种基于理论分析的基线方法(EU-k和CF-k),其在性能上优于多种主流非精确遗忘方法。总体而言,本研究展示了对抗性评估策略如何帮助分析各类遗忘现象,从而指导更强大遗忘算法的开发。