Federated learning (FL), introduced in 2017, facilitates collaborative learning between non-trusting parties with no need for the parties to explicitly share their data among themselves. This allows training models on user data while respecting privacy regulations such as GDPR and CPRA. However, emerging privacy requirements may mandate model owners to be able to \emph{forget} some learned data, e.g., when requested by data owners or law enforcement. This has given birth to an active field of research called \emph{machine unlearning}. In the context of FL, many techniques developed for unlearning in centralized settings are not trivially applicable! This is due to the unique differences between centralized and distributed learning, in particular, interactivity, stochasticity, heterogeneity, and limited accessibility in FL. In response, a recent line of work has focused on developing unlearning mechanisms tailored to FL. This SoK paper aims to take a deep look at the \emph{federated unlearning} literature, with the goal of identifying research trends and challenges in this emerging field. By carefully categorizing papers published on FL unlearning (since 2020), we aim to pinpoint the unique complexities of federated unlearning, highlighting limitations on directly applying centralized unlearning methods. We compare existing federated unlearning methods regarding influence removal and performance recovery, compare their threat models and assumptions, and discuss their implications and limitations. For instance, we analyze the experimental setup of FL unlearning studies from various perspectives, including data heterogeneity and its simulation, the datasets used for demonstration, and evaluation metrics. Our work aims to offer insights and suggestions for future research on federated unlearning.
翻译:联邦学习(FL)于2017年提出,它能够促进非信任方之间的协作学习,而无需各方之间明确共享数据。这使得在遵守GDPR和CPRA等隐私法规的前提下,能够基于用户数据进行模型训练。然而,新兴的隐私要求可能要求模型所有者能够"遗忘"某些已学习的数据,例如,当数据所有者或执法机构提出请求时。这催生了一个活跃的研究领域,即"机器遗忘"(machine unlearning)。在FL的背景下,许多为集中式学习环境开发的遗忘技术并不直接适用!这是由于集中式学习与分布式学习之间存在独特差异,特别是FL中的交互性、随机性、异构性和有限的可访问性。为此,近期一系列工作致力于开发针对FL的遗忘机制。本SoK论文旨在深入审视"联邦遗忘"(federated unlearning)文献,以识别这一新兴领域的研究趋势和挑战。通过仔细分类自2020年以来已发表的FL遗忘论文,我们旨在精准定位联邦遗忘的独特复杂性,并强调直接应用集中式遗忘方法的局限性。我们比较了现有联邦遗忘方法在影响移除和性能恢复方面的表现,对比了它们的威胁模型和假设,并讨论了它们的含义与局限性。例如,我们从多个角度分析了FL遗忘研究的实验设置,包括数据异构性及其模拟、用于演示的数据集以及评估指标。我们的工作旨在为联邦遗忘的未来研究提供见解和建议。