Federated learning (FL) is a distributed machine learning paradigm enabling multiple clients to train a model collaboratively without exposing their local data. Among FL schemes, clustering is an effective technique addressing the heterogeneity issue (i.e., differences in data distribution and computational ability affect training performance and effectiveness) via grouping participants with similar computational resources or data distribution into clusters. However, intra-cluster data exchange poses privacy risks, while cluster selection and adaptation introduce challenges that may affect overall performance. To address these challenges, this paper introduces anonymous adaptive clustering, a novel approach that simultaneously enhances privacy protection and boosts training efficiency. Specifically, an oblivious shuffle-based anonymization method is designed to safeguard user identities and prevent the aggregation server from inferring similarities through clustering. Additionally, to improve performance, we introduce an iteration-based adaptive frequency decay strategy, which leverages variability in clustering probabilities to optimize training dynamics. With these techniques, we build the FedCAPrivacy; experiments show that FedCAPrivacy achieves ~7X improvement in terms of performance while maintaining high privacy.
翻译:联邦学习(Federated Learning, FL)是一种分布式机器学习范式,允许多个客户端在不暴露本地数据的情况下协作训练模型。在联邦学习方案中,聚类是一种解决异构性问题的有效技术,即通过将具有相似计算资源或数据分布的参与者分组为簇,以应对数据分布和计算能力差异对训练性能和效果的影响。然而,簇内数据交换会带来隐私风险,而簇的选择与自适应则引入了可能影响整体性能的挑战。为解决这些问题,本文提出了匿名自适应聚类这一新方法,该方法在增强隐私保护的同时提升了训练效率。具体而言,设计了一种基于不经意洗牌的匿名化方法,以保护用户身份并防止聚合服务器通过聚类推断相似性。此外,为提升性能,我们引入了一种基于迭代的自适应频率衰减策略,该策略利用聚类概率的变异性来优化训练动态。基于这些技术,我们构建了FedCAPrivacy;实验表明,FedCAPrivacy在保持高隐私性的同时,性能提升了约7倍。