Federated clustering (FC) is an unsupervised learning problem that arises in a number of practical applications, including personalized recommender and healthcare systems. With the adoption of recent laws ensuring the "right to be forgotten", the problem of machine unlearning for FC methods has become of significant importance. We introduce, for the first time, the problem of machine unlearning for FC, and propose an efficient unlearning mechanism for a customized secure FC framework. Our FC framework utilizes special initialization procedures that we show are well-suited for unlearning. To protect client data privacy, we develop the secure compressed multiset aggregation (SCMA) framework that addresses sparse secure federated learning (FL) problems encountered during clustering as well as more general problems. To simultaneously facilitate low communication complexity and secret sharing protocols, we integrate Reed-Solomon encoding with special evaluation points into our SCMA pipeline, and prove that the client communication cost is logarithmic in the vector dimension. Additionally, to demonstrate the benefits of our unlearning mechanism over complete retraining, we provide a theoretical analysis for the unlearning performance of our approach. Simulation results show that the new FC framework exhibits superior clustering performance compared to previously reported FC baselines when the cluster sizes are highly imbalanced. Compared to completely retraining K-means++ locally and globally for each removal request, our unlearning procedure offers an average speed-up of roughly 84x across seven datasets. Our implementation for the proposed method is available at https://github.com/thupchnsky/mufc.
翻译:联邦聚类(FC)是一个在许多实际应用(包括个性化推荐系统和医疗保健系统)中出现的无监督学习问题。随着保障“被遗忘权”的最新法律的实施,联邦聚类方法中的机器遗忘问题变得至关重要。我们首次提出联邦聚类的机器遗忘问题,并针对定制的安全联邦聚类框架设计了一种高效的遗忘机制。我们的联邦聚类框架利用了特殊的初始化程序,我们证明这些程序非常适合遗忘处理。为保护客户端数据隐私,我们开发了安全压缩多重集合聚合(SCMA)框架,该框架可解决聚类过程中遇到的稀疏安全联邦学习(FL)问题以及更一般的问题。为同时实现低通信复杂度和秘密共享协议,我们将具有特殊评估点的里德-所罗门编码集成到SCMA流程中,并证明客户端通信成本与向量维度呈对数关系。此外,为证明我们的遗忘机制相比完全重新训练的优势,我们对所提出方法的遗忘性能进行了理论分析。仿真结果表明,当聚类规模高度不平衡时,与先前报道的联邦聚类基线相比,新联邦聚类框架表现出更优的聚类性能。相较于针对每次移除请求在本地和全局完全重新训练K-means++,我们的遗忘程序在七个数据集上平均加速约84倍。所提出方法的实现代码见https://github.com/thupchnsky/mufc。