We revisit the efficacy of several practical methods for approximate machine unlearning developed for large-scale deep learning. In addition to complying with data deletion requests, one often-cited potential application for unlearning methods is to remove the effects of poisoned data. We experimentally demonstrate that, while existing unlearning methods have been demonstrated to be effective in a number of settings, they fail to remove the effects of data poisoning across a variety of types of poisoning attacks (indiscriminate, targeted, and a newly-introduced Gaussian poisoning attack) and models (image classifiers and LLMs); even when granted a relatively large compute budget. In order to precisely characterize unlearning efficacy, we introduce new evaluation metrics for unlearning based on data poisoning. Our results suggest that a broader perspective, including a wider variety of evaluations, are required to avoid a false sense of confidence in machine unlearning procedures for deep learning without provable guarantees. Moreover, while unlearning methods show some signs of being useful to efficiently remove poisoned data without having to retrain, our work suggests that these methods are not yet ``ready for prime time,'' and currently provide limited benefit over retraining.
翻译:本文重新评估了针对大规模深度学习开发的几种近似机器学习遗忘方法的实际效能。除了满足数据删除请求外,遗忘方法常被提及的潜在应用场景是消除投毒数据的影响。我们通过实验证明:尽管现有遗忘方法在多种场景下已被证实有效,但在面对不同类型的投毒攻击(无差别攻击、定向攻击及新提出的高斯投毒攻击)和模型(图像分类器与大语言模型)时,即使分配相对充足的计算资源,这些方法仍无法有效消除数据投毒的影响。为精确量化遗忘效能,我们基于数据投毒提出了新的遗忘评估指标。研究结果表明,需要在更广阔的视角下采用多样化的评估方法,才能避免对缺乏可证明保证的深度学习遗忘过程产生盲目信心。此外,虽然遗忘方法在高效清除投毒数据方面显示出一定潜力,无需完全重新训练,但我们的研究表明这些方法尚未达到“成熟可用”阶段,目前相较于重新训练提供的优势仍较为有限。