Releasing Malevolence from Benevolence: The Menace of Benign Data on Machine Unlearning

Machine learning models trained on vast amounts of real or synthetic data often achieve outstanding predictive performance across various domains. However, this utility comes with increasing concerns about privacy, as the training data may include sensitive information. To address these concerns, machine unlearning has been proposed to erase specific data samples from models. While some unlearning techniques efficiently remove data at low costs, recent research highlights vulnerabilities where malicious users could request unlearning on manipulated data to compromise the model. Despite these attacks' effectiveness, perturbed data differs from original training data, failing hash verification. Existing attacks on machine unlearning also suffer from practical limitations and require substantial additional knowledge and resources. To fill the gaps in current unlearning attacks, we introduce the Unlearning Usability Attack. This model-agnostic, unlearning-agnostic, and budget-friendly attack distills data distribution information into a small set of benign data. These data are identified as benign by automatic poisoning detection tools due to their positive impact on model training. While benign for machine learning, unlearning these data significantly degrades model information. Our evaluation demonstrates that unlearning this benign data, comprising no more than 1% of the total training data, can reduce model accuracy by up to 50%. Furthermore, our findings show that well-prepared benign data poses challenges for recent unlearning techniques, as erasing these synthetic instances demands higher resources than regular data. These insights underscore the need for future research to reconsider "data poisoning" in the context of machine unlearning.

翻译：在大量真实或合成数据上训练的机器学习模型，通常能在各个领域实现卓越的预测性能。然而，这种效用伴随着日益增长的隐私担忧，因为训练数据可能包含敏感信息。为应对这些担忧，机器遗忘技术被提出，旨在从模型中擦除特定的数据样本。尽管一些遗忘技术能以较低成本高效地移除数据，但近期研究揭示了其脆弱性：恶意用户可能请求对经过操纵的数据进行遗忘，从而损害模型性能。尽管这些攻击具有效力，但扰动数据与原始训练数据存在差异，无法通过哈希验证。此外，现有的机器遗忘攻击还存在实际局限性，需要大量的额外知识和资源。为填补当前遗忘攻击的空白，我们提出了遗忘可用性攻击。这种攻击与模型无关、与遗忘技术无关且成本低廉，它将数据分布信息提炼为一小部分良性数据。由于这些数据对模型训练具有积极影响，它们被自动投毒检测工具识别为良性。虽然对机器学习而言是良性的，但遗忘这些数据会显著降低模型的信息量。我们的评估表明，遗忘这些不超过总训练数据1%的良性数据，可使模型准确率降低高达50%。此外，我们的研究结果显示，精心准备的良性数据对近期的遗忘技术构成了挑战，因为擦除这些合成实例比常规数据需要更高的资源。这些发现强调，未来的研究需要在机器遗忘的背景下重新审视“数据投毒”问题。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日