Federated Unlearning: A Survey on Methods, Design Guidelines, and Evaluation Metrics

Federated learning (FL) enables collaborative training of a machine learning (ML) model across multiple parties, facilitating the preservation of users' and institutions' privacy by maintaining data stored locally. Instead of centralizing raw data, FL exchanges locally refined model parameters to build a global model incrementally. While FL is more compliant with emerging regulations such as the European General Data Protection Regulation (GDPR), ensuring the right to be forgotten in this context - allowing FL participants to remove their data contributions from the learned model - remains unclear. In addition, it is recognized that malicious clients may inject backdoors into the global model through updates, e.g., to generate mispredictions on specially crafted data examples. Consequently, there is the need for mechanisms that can guarantee individuals the possibility to remove their data and erase malicious contributions even after aggregation, without compromising the already acquired "good" knowledge. This highlights the necessity for novel federated unlearning (FU) algorithms, which can efficiently remove specific clients' contributions without full model retraining. This article provides background concepts, empirical evidence, and practical guidelines to design/implement efficient FU schemes. This study includes a detailed analysis of the metrics for evaluating unlearning in FL and presents an in-depth literature review categorizing state-of-the-art FU contributions under a novel taxonomy. Finally, we outline the most relevant and still open technical challenges, by identifying the most promising research directions in the field.

翻译：联邦学习（FL）使得多方能够协作训练机器学习（ML）模型，通过将数据保留在本地来保护用户和机构的隐私。FL不集中原始数据，而是交换本地精炼的模型参数以逐步构建全局模型。尽管FL更符合《欧盟通用数据保护条例》（GDPR）等新兴法规的要求，但在此背景下确保“被遗忘权”——允许FL参与者将其数据贡献从已学习模型中移除——的实现方式仍不明确。此外，研究已认识到恶意客户端可能通过模型更新向全局模型注入后门，例如在特定构造的数据样本上产生错误预测。因此，需要一种机制能够保证个体即使在模型聚合后，仍可移除其数据并消除恶意贡献，同时不损害已获得的“良性”知识。这凸显了对新型联邦遗忘学习（FU）算法的需求，此类算法能够高效移除特定客户端的贡献而无需完整模型重训练。本文提供了设计/实现高效FU方案的背景概念、实证依据与实践指南。本研究包含对FL中遗忘效果评估指标的详细分析，并通过一种新颖的分类法对现有最先进的FU研究成果进行了深入的文献综述与归类。最后，我们通过识别该领域最具前景的研究方向，阐述了当前最相关且尚未解决的技术挑战。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日