Machine unlearning through fine-grained model parameters perturbation

Machine unlearning techniques, which involve retracting data records and reducing influence of said data on trained models, help with the user privacy protection objective but incur significant computational costs. Weight perturbation-based unlearning is a general approach, but it typically involves globally modifying the parameters. We propose fine-grained Top-K and Random-k parameters perturbed inexact machine unlearning strategies that address the privacy needs while keeping the computational costs tractable. In order to demonstrate the efficacy of our strategies we also tackle the challenge of evaluating the effectiveness of machine unlearning by considering the model's generalization performance across both unlearning and remaining data. To better assess the unlearning effect and model generalization, we propose novel metrics, namely, the forgetting rate and memory retention rate. However, for inexact machine unlearning, current metrics are inadequate in quantifying the degree of forgetting that occurs after unlearning strategies are applied. To address this, we introduce SPD-GAN, which subtly perturbs the distribution of data targeted for unlearning. Then, we evaluate the degree of unlearning by measuring the performance difference of the models on the perturbed unlearning data before and after the unlearning process. By implementing these innovative techniques and metrics, we achieve computationally efficacious privacy protection in machine learning applications without significant sacrifice of model performance. Furthermore, this approach provides a novel method for evaluating the degree of unlearning.

翻译：机器遗忘技术旨在撤回数据记录并减少这些数据对已训练模型的影响，有助于实现用户隐私保护目标，但会带来显著的计算成本。基于权重扰动的遗忘是一种通用方法，但通常涉及全局参数修改。我们提出了细粒度的Top-K和Random-k参数扰动非精确机器遗忘策略，在保持计算成本可控的同时满足隐私需求。为验证策略有效性，我们还通过考虑模型在遗忘数据和保留数据上的泛化性能，应对了评估机器遗忘效果的挑战。为更好评估遗忘效果和模型泛化能力，我们提出了新型指标，即遗忘率和记忆保留率。然而，对于非精确机器遗忘，现有指标在量化应用遗忘策略后的遗忘程度方面存在不足。为此，我们引入SPD-GAN，该技术巧妙扰动目标遗忘数据的分布。随后，通过测量遗忘过程前后模型在扰动遗忘数据上的性能差异，评估遗忘程度。通过实施这些创新技术与指标，我们在机器学习应用中实现了计算高效的隐私保护，且未显著牺牲模型性能。此外，该方法为评估遗忘程度提供了新途径。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《图机器学习》课程

专知会员服务

49+阅读 · 2024年2月18日

《用于无线通信和传感的智能反射面 (IRS)》（ICC 2022）新加坡国立大学2022最新53页slides

专知会员服务

26+阅读 · 2022年11月16日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日