Machine unlearning techniques, which involve retracting data records and reducing influence of said data on trained models, help with the user privacy protection objective but incur significant computational costs. Weight perturbation-based unlearning is a general approach, but it typically involves globally modifying the parameters. We propose fine-grained Top-K and Random-k parameters perturbed inexact machine unlearning strategies that address the privacy needs while keeping the computational costs tractable. In order to demonstrate the efficacy of our strategies we also tackle the challenge of evaluating the effectiveness of machine unlearning by considering the model's generalization performance across both unlearning and remaining data. To better assess the unlearning effect and model generalization, we propose novel metrics, namely, the forgetting rate and memory retention rate. However, for inexact machine unlearning, current metrics are inadequate in quantifying the degree of forgetting that occurs after unlearning strategies are applied. To address this, we introduce SPD-GAN, which subtly perturbs the distribution of data targeted for unlearning. Then, we evaluate the degree of unlearning by measuring the performance difference of the models on the perturbed unlearning data before and after the unlearning process. By implementing these innovative techniques and metrics, we achieve computationally efficacious privacy protection in machine learning applications without significant sacrifice of model performance. Furthermore, this approach provides a novel method for evaluating the degree of unlearning.
翻译:机器遗忘技术旨在撤回数据记录并减少这些数据对已训练模型的影响,有助于实现用户隐私保护目标,但会带来显著的计算成本。基于权重扰动的遗忘是一种通用方法,但通常涉及全局参数修改。我们提出了细粒度的Top-K和Random-k参数扰动非精确机器遗忘策略,在保持计算成本可控的同时满足隐私需求。为验证策略有效性,我们还通过考虑模型在遗忘数据和保留数据上的泛化性能,应对了评估机器遗忘效果的挑战。为更好评估遗忘效果和模型泛化能力,我们提出了新型指标,即遗忘率和记忆保留率。然而,对于非精确机器遗忘,现有指标在量化应用遗忘策略后的遗忘程度方面存在不足。为此,我们引入SPD-GAN,该技术巧妙扰动目标遗忘数据的分布。随后,通过测量遗忘过程前后模型在扰动遗忘数据上的性能差异,评估遗忘程度。通过实施这些创新技术与指标,我们在机器学习应用中实现了计算高效的隐私保护,且未显著牺牲模型性能。此外,该方法为评估遗忘程度提供了新途径。