Federated learning (FL) enables collaborative training of a machine learning (ML) model across multiple parties, facilitating the preservation of users' and institutions' privacy by maintaining data stored locally. Instead of centralizing raw data, FL exchanges locally refined model parameters to build a global model incrementally. While FL is more compliant with emerging regulations such as the European General Data Protection Regulation (GDPR), ensuring the right to be forgotten in this context - allowing FL participants to remove their data contributions from the learned model - remains unclear. In addition, it is recognized that malicious clients may inject backdoors into the global model through updates, e.g., to generate mispredictions on specially crafted data examples. Consequently, there is the need for mechanisms that can guarantee individuals the possibility to remove their data and erase malicious contributions even after aggregation, without compromising the already acquired "good" knowledge. This highlights the necessity for novel federated unlearning (FU) algorithms, which can efficiently remove specific clients' contributions without full model retraining. This article provides background concepts, empirical evidence, and practical guidelines to design/implement efficient FU schemes. This study includes a detailed analysis of the metrics for evaluating unlearning in FL and presents an in-depth literature review categorizing state-of-the-art FU contributions under a novel taxonomy. Finally, we outline the most relevant and still open technical challenges, by identifying the most promising research directions in the field.
翻译:联邦学习(FL)使得多方能够协作训练机器学习(ML)模型,通过将数据保留在本地来保护用户和机构的隐私。FL不集中原始数据,而是交换本地精炼的模型参数以逐步构建全局模型。尽管FL更符合《欧盟通用数据保护条例》(GDPR)等新兴法规的要求,但在此背景下确保“被遗忘权”——允许FL参与者将其数据贡献从已学习模型中移除——的实现方式仍不明确。此外,研究已认识到恶意客户端可能通过模型更新向全局模型注入后门,例如在特定构造的数据样本上产生错误预测。因此,需要一种机制能够保证个体即使在模型聚合后,仍可移除其数据并消除恶意贡献,同时不损害已获得的“良性”知识。这凸显了对新型联邦遗忘学习(FU)算法的需求,此类算法能够高效移除特定客户端的贡献而无需完整模型重训练。本文提供了设计/实现高效FU方案的背景概念、实证依据与实践指南。本研究包含对FL中遗忘效果评估指标的详细分析,并通过一种新颖的分类法对现有最先进的FU研究成果进行了深入的文献综述与归类。最后,我们通过识别该领域最具前景的研究方向,阐述了当前最相关且尚未解决的技术挑战。