Towards Lifecycle Unlearning Commitment Management: Measuring Sample-level Approximate Unlearning Completeness

By adopting a more flexible definition of unlearning and adjusting the model distribution to simulate training without the targeted data, approximate machine unlearning provides a less resource-demanding alternative to the more laborious exact unlearning methods. Yet, the unlearning completeness of target samples-even when the approximate algorithms are executed faithfully without external threats-remains largely unexamined, raising questions about those approximate algorithms' ability to fulfill their commitment of unlearning during the lifecycle. In this paper, we introduce the task of Lifecycle Unlearning Commitment Management (LUCM) for approximate unlearning and outline its primary challenges. We propose an efficient metric designed to assess the sample-level unlearning completeness. Our empirical results demonstrate its superiority over membership inference techniques in two key areas: the strong correlation of its measurements with unlearning completeness across various unlearning tasks, and its computational efficiency, making it suitable for real-time applications. Additionally, we show that this metric is able to serve as a tool for monitoring unlearning anomalies throughout the unlearning lifecycle, including both under-unlearning and over-unlearning. We apply this metric to evaluate the unlearning commitments of current approximate algorithms. Our analysis, conducted across multiple unlearning benchmarks, reveals that these algorithms inconsistently fulfill their unlearning commitments due to two main issues: 1) unlearning new data can significantly affect the unlearning utility of previously requested data, and 2) approximate algorithms fail to ensure equitable unlearning utility across different groups. These insights emphasize the crucial importance of LUCM throughout the unlearning lifecycle. We will soon open-source our newly developed benchmark.

翻译：通过采用更灵活的遗忘定义并调整模型分布以模拟不包含目标数据的训练过程，近似机器遗忘提供了一种比更繁琐的精确遗忘方法资源消耗更低的替代方案。然而，即使近似算法在没有外部威胁的情况下忠实执行，目标样本的遗忘完备性在很大程度上仍未得到检验，这引发了关于这些近似算法在生命周期中履行其遗忘承诺能力的疑问。本文提出了面向近似遗忘的生命周期遗忘承诺管理（LUCM）任务，并概述了其主要挑战。我们提出了一种高效的度量指标，用于评估样本级的遗忘完备性。实验结果表明，该指标在两个方面优于成员推断技术：其测量结果与各种遗忘任务中的遗忘完备性高度相关，以及其计算效率使其适用于实时应用。此外，我们展示了该指标能够作为监测遗忘生命周期中遗忘异常（包括欠遗忘和过遗忘）的工具。我们将该指标应用于评估当前近似算法的遗忘承诺。在多个遗忘基准测试中进行的分析揭示，这些算法因两个主要问题而未能一致地履行其遗忘承诺：1）遗忘新数据会显著影响先前请求数据的遗忘效用，2）近似算法无法确保不同组之间的公平遗忘效用。这些发现强调了LUCM在整个遗忘生命周期中的关键重要性。我们即将开源新开发的基准测试。