We study federated unlearning, a novel problem to eliminate the impact of specific clients or data points on the global model learned via federated learning (FL). This problem is driven by the right to be forgotten and the privacy challenges in FL. We introduce a new framework for exact federated unlearning that meets two essential criteria: \textit{communication efficiency} and \textit{exact unlearning provability}. To our knowledge, this is the first work to tackle both aspects coherently. We start by giving a rigorous definition of \textit{exact} federated unlearning, which guarantees that the unlearned model is statistically indistinguishable from the one trained without the deleted data. We then pinpoint the key property that enables fast exact federated unlearning: total variation (TV) stability, which measures the sensitivity of the model parameters to slight changes in the dataset. Leveraging this insight, we develop a TV-stable FL algorithm called \texttt{FATS}, which modifies the classical \texttt{\underline{F}ed\underline{A}vg} algorithm for \underline{T}V \underline{S}tability and employs local SGD with periodic averaging to lower the communication round. We also design efficient unlearning algorithms for \texttt{FATS} under two settings: client-level and sample-level unlearning. We provide theoretical guarantees for our learning and unlearning algorithms, proving that they achieve exact federated unlearning with reasonable convergence rates for both the original and unlearned models. We empirically validate our framework on 6 benchmark datasets, and show its superiority over state-of-the-art methods in terms of accuracy, communication cost, computation cost, and unlearning efficacy.
翻译:我们研究了联邦遗忘这一新问题,旨在消除特定客户端或数据点对通过联邦学习(FL)训练所得全局模型的影响。该问题源于“被遗忘权”及联邦学习中的隐私挑战。我们提出了一种满足两个关键标准的精确联邦遗忘新框架:通信效率与精确遗忘可证明性。据我们所知,这是首个统一解决这两方面问题的研究工作。我们首先给出精确联邦遗忘的严格定义,该定义保证遗忘后的模型与从头训练(不含被删除数据)的模型在统计上不可区分。接着,我们指出了实现快速精确联邦遗忘的关键性质:总变差(TV)稳定性,它衡量模型参数对数据集微小变化的敏感度。基于这一发现,我们开发了一种TV稳定的联邦学习算法FATS,该算法对经典FedAvg算法进行了针对TV稳定性的改进,并通过使用带周期性平均的本地SGD来降低通信轮次。我们还为FATS设计了两种场景下的高效遗忘算法:客户端级遗忘与样本级遗忘。我们为学习算法与遗忘算法提供了理论保证,证明它们能实现精确联邦遗忘,且原始模型与遗忘后的模型均具有合理的收敛速率。我们在6个基准数据集上进行了实验验证,结果表明我们的框架在准确性、通信成本、计算成本及遗忘有效性方面均优于当前最先进的方法。