QuickDrop: Efficient Federated Unlearning by Integrated Dataset Distillation

Federated Unlearning (FU) aims to delete specific training data from an ML model trained using Federated Learning (FL). We introduce QuickDrop, an efficient and original FU method that utilizes dataset distillation (DD) to accelerate unlearning and drastically reduces computational overhead compared to existing approaches. In QuickDrop, each client uses DD to generate a compact dataset representative of the original training dataset, called a distilled dataset, and uses this compact dataset during unlearning. To unlearn specific knowledge from the global model, QuickDrop has clients execute Stochastic Gradient Ascent with samples from the distilled datasets, thus significantly reducing computational overhead compared to conventional FU methods. We further increase the efficiency of QuickDrop by ingeniously integrating DD into the FL training process. By reusing the gradient updates produced during FL training for DD, the overhead of creating distilled datasets becomes close to negligible. Evaluations on three standard datasets show that, with comparable accuracy guarantees, QuickDrop reduces the duration of unlearning by 463.8x compared to model retraining from scratch and 65.1x compared to existing FU approaches. We also demonstrate the scalability of QuickDrop with 100 clients and show its effectiveness while handling multiple unlearning operations.

翻译：联邦遗忘（FU）旨在从通过联邦学习（FL）训练的机器学习模型中删除特定的训练数据。本文提出QuickDrop，一种高效且原创的FU方法，其利用数据集蒸馏（DD）来加速遗忘过程，与现有方法相比显著降低了计算开销。在QuickDrop中，每个客户端使用DD生成代表原始训练数据集的紧凑数据集（称为蒸馏数据集），并在遗忘过程中使用该紧凑数据集。为从全局模型中遗忘特定知识，QuickDrop让客户端基于蒸馏数据集样本执行随机梯度上升，从而相比传统FU方法显著降低计算开销。我们通过巧妙地将DD集成到FL训练流程中，进一步提升了QuickDrop的效率。通过重用FL训练过程中产生的梯度更新进行DD，创建蒸馏数据集的开销几乎可忽略不计。在三个标准数据集上的评估表明，在保证精度相当的前提下，QuickDrop将遗忘耗时相比从头开始模型重训练降低了463.8倍，相比现有FU方法降低了65.1倍。我们还在100个客户端规模下验证了QuickDrop的可扩展性，并展示了其在处理多次遗忘操作时的有效性。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日