Machine unlearning through fine-grained model parameters perturbation

Machine unlearning techniques, which involve retracting data records and reducing influence of said data on trained models, help with the user privacy protection objective but incur significant computational costs. Weight perturbation-based unlearning is a general approach, but it typically involves globally modifying the parameters. We propose fine-grained Top-K and Random-k parameters perturbed inexact machine unlearning strategies that address the privacy needs while keeping the computational costs tractable. In order to demonstrate the efficacy of our strategies we also tackle the challenge of evaluating the effectiveness of machine unlearning by considering the model's generalization performance across both unlearning and remaining data. To better assess the unlearning effect and model generalization, we propose novel metrics, namely, the forgetting rate and memory retention rate. However, for inexact machine unlearning, current metrics are inadequate in quantifying the degree of forgetting that occurs after unlearning strategies are applied. To address this, we introduce SPD-GAN, which subtly perturbs the distribution of data targeted for unlearning. Then, we evaluate the degree of unlearning by measuring the performance difference of the models on the perturbed unlearning data before and after the unlearning process. By implementing these innovative techniques and metrics, we achieve computationally efficacious privacy protection in machine learning applications without significant sacrifice of model performance. Furthermore, this approach provides a novel method for evaluating the degree of unlearning.

翻译：机器遗忘技术旨在撤销数据记录并降低其对已训练模型的影响，有助于实现用户隐私保护目标，但会带来显著的计算成本。基于权重扰动的遗忘是一种通用方法，但通常涉及全局性参数修改。我们提出了细粒度的Top-K与Random-k参数扰动非精确机器遗忘策略，在满足隐私需求的同时保持计算成本可控。为验证策略的有效性，我们还通过考虑模型在遗忘数据与剩余数据上的泛化性能，解决了机器遗忘效果评估的挑战。为更好评估遗忘效果与模型泛化能力，我们提出了新颖的度量指标：遗忘率与记忆保持率。然而对于非精确机器遗忘，现有指标难以量化遗忘策略实施后的遗忘程度。为此，我们引入SPD-GAN，通过微妙扰动待遗忘数据的分布，进而通过比较模型在扰动后的遗忘数据上于遗忘过程前后的性能差异来评估遗忘程度。通过实施这些创新技术与度量指标，我们在机器学习应用中实现了计算高效的隐私保护，且未显著牺牲模型性能。此外，该方法为评估遗忘程度提供了全新的方法论。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《图机器学习》课程

专知会员服务

49+阅读 · 2024年2月18日

《用于无线通信和传感的智能反射面 (IRS)》（ICC 2022）新加坡国立大学2022最新53页slides

专知会员服务

26+阅读 · 2022年11月16日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日