Machine unlearning enables data holders to remove the contribution of their specified samples from trained models to protect their privacy. However, it is paradoxical that most unlearning methods require the unlearning requesters to firstly upload their data to the server as a prerequisite for unlearning. These methods are infeasible in many privacy-preserving scenarios where servers are prohibited from accessing users' data, such as federated learning (FL). In this paper, we explore how to implement unlearning under the condition of not uncovering the erasing data to the server. We propose \textbf{Blind Unlearning (BlindU)}, which carries out unlearning using compressed representations instead of original inputs. BlindU only involves the server and the unlearning user: the user locally generates privacy-preserving representations, and the server performs unlearning solely on these representations and their labels. For the FL model training, we employ the information bottleneck (IB) mechanism. The encoder of the IB-based FL model learns representations that distort maximum task-irrelevant information from inputs, allowing FL users to generate compressed representations locally. For effective unlearning using compressed representation, BlindU integrates two dedicated unlearning modules tailored explicitly for IB-based models and uses a multiple gradient descent algorithm to balance forgetting and utility retaining. While IB compression already provides protection for task-irrelevant information of inputs, to further enhance the privacy protection, we introduce a noise-free differential privacy (DP) masking method to deal with the raw erasing data before compressing. Theoretical analysis and extensive experimental results illustrate the superiority of BlindU in privacy protection and unlearning effectiveness compared with the best existing privacy-preserving unlearning benchmarks.
翻译:机器学习遗忘技术使数据持有者能够从已训练模型中移除其指定样本的贡献,以保护其隐私。然而,当前大多数遗忘方法均要求遗忘请求者首先将其数据上传至服务器作为遗忘前提,这在许多禁止服务器访问用户数据的隐私保护场景(如联邦学习)中并不可行。本文探讨如何在服务器不接触待擦除数据的条件下实现遗忘。我们提出**盲遗忘方法**,该方法通过压缩表征而非原始输入执行遗忘。BlindU仅涉及服务器与遗忘用户:用户在本地生成隐私保护表征,服务器仅基于这些表征及其标签执行遗忘。针对联邦学习模型训练,我们采用信息瓶颈机制。基于信息瓶颈的联邦学习模型编码器学习从输入中最大限度扭曲任务无关信息的表征,使联邦学习用户能在本地生成压缩表征。为实现基于压缩表征的有效遗忘,BlindU集成了两个专为信息瓶颈模型设计的遗忘模块,并采用多重梯度下降算法平衡遗忘效果与模型效用保持。虽然信息瓶颈压缩已为输入的任务无关信息提供保护,为进一步增强隐私保护,我们引入无噪声差分隐私掩码方法对压缩前的原始待擦除数据进行处理。理论分析与大量实验结果表明,相较于现有最优的隐私保护遗忘基准方法,BlindU在隐私保护与遗忘效能方面均具有显著优势。