BlindU: Blind Machine Unlearning without Revealing Erasing Data

Machine unlearning enables data holders to remove the contribution of their specified samples from trained models to protect their privacy. However, it is paradoxical that most unlearning methods require the unlearning requesters to firstly upload their data to the server as a prerequisite for unlearning. These methods are infeasible in many privacy-preserving scenarios where servers are prohibited from accessing users' data, such as federated learning (FL). In this paper, we explore how to implement unlearning under the condition of not uncovering the erasing data to the server. We propose \textbf{Blind Unlearning (BlindU)}, which carries out unlearning using compressed representations instead of original inputs. BlindU only involves the server and the unlearning user: the user locally generates privacy-preserving representations, and the server performs unlearning solely on these representations and their labels. For the FL model training, we employ the information bottleneck (IB) mechanism. The encoder of the IB-based FL model learns representations that distort maximum task-irrelevant information from inputs, allowing FL users to generate compressed representations locally. For effective unlearning using compressed representation, BlindU integrates two dedicated unlearning modules tailored explicitly for IB-based models and uses a multiple gradient descent algorithm to balance forgetting and utility retaining. While IB compression already provides protection for task-irrelevant information of inputs, to further enhance the privacy protection, we introduce a noise-free differential privacy (DP) masking method to deal with the raw erasing data before compressing. Theoretical analysis and extensive experimental results illustrate the superiority of BlindU in privacy protection and unlearning effectiveness compared with the best existing privacy-preserving unlearning benchmarks.

翻译：机器学习遗忘技术使数据持有者能够从已训练模型中移除其指定样本的贡献，以保护其隐私。然而，当前大多数遗忘方法均要求遗忘请求者首先将其数据上传至服务器作为遗忘前提，这在许多禁止服务器访问用户数据的隐私保护场景（如联邦学习）中并不可行。本文探讨如何在服务器不接触待擦除数据的条件下实现遗忘。我们提出**盲遗忘方法**，该方法通过压缩表征而非原始输入执行遗忘。BlindU仅涉及服务器与遗忘用户：用户在本地生成隐私保护表征，服务器仅基于这些表征及其标签执行遗忘。针对联邦学习模型训练，我们采用信息瓶颈机制。基于信息瓶颈的联邦学习模型编码器学习从输入中最大限度扭曲任务无关信息的表征，使联邦学习用户能在本地生成压缩表征。为实现基于压缩表征的有效遗忘，BlindU集成了两个专为信息瓶颈模型设计的遗忘模块，并采用多重梯度下降算法平衡遗忘效果与模型效用保持。虽然信息瓶颈压缩已为输入的任务无关信息提供保护，为进一步增强隐私保护，我们引入无噪声差分隐私掩码方法对压缩前的原始待擦除数据进行处理。理论分析与大量实验结果表明，相较于现有最优的隐私保护遗忘基准方法，BlindU在隐私保护与遗忘效能方面均具有显著优势。

相关内容

服务器

关注 14

服务器，也称伺服器，是提供计算服务的设备。由于服务器需要响应服务请求，并进行处理，因此一般来说服务器应具备承担服务并且保障服务的能力。
服务器的构成包括处理器、硬盘、内存、系统总线等，和通用的计算机架构类似，但是由于需要提供高可靠的服务，因此在处理能力、稳定性、可靠性、安全性、可扩展性、可管理性等方面要求较高。

大模型如何遗忘不良知识？最新《生成式人工智能中的机器遗忘》综述

专知会员服务

24+阅读 · 2024年8月1日

机器遗忘综述：技术与新出现的隐私风险

专知会员服务

24+阅读 · 2024年6月16日

机器遗忘：分类、指标、应用、挑战与展望

专知会员服务

36+阅读 · 2024年3月16日