Federated Learning (FL) trains deep models across edge devices without centralizing raw data, preserving user privacy. However, client heterogeneity slows down convergence and limits global model accuracy. Clustered FL (CFL) mitigates this by grouping clients with similar representations and training a separate model for each cluster. In practice, client data evolves over time, a phenomenon we refer to as data drift, which breaks cluster homogeneity and degrades performance. Data drift can take different forms depending on whether changes occur in the output values, the input features, or the relationship between them. We propose FIELDING, a CFL framework for handling diverse types of data drift with low overhead. FIELDING detects drift at individual clients and performs selective re-clustering to balance cluster quality and model performance, while remaining robust to malicious clients and varying levels of heterogeneity. Experiments show that FIELDING improves final model accuracy by 1.9-5.9% and achieves target accuracy 1.16x-2.23x faster than existing state-of-the-art CFL methods.
翻译:联邦学习(FL)在边缘设备上训练深度模型而无需集中原始数据,从而保护用户隐私。然而,客户端异构性会减缓收敛速度并限制全局模型精度。聚类联邦学习(CFL)通过将具有相似表示的客户端分组并为每个聚类训练单独模型来缓解此问题。在实践中,客户端数据会随时间演变,我们称此现象为数据漂移,它会破坏聚类的同质性并降低性能。数据漂移可呈现不同形式,具体取决于变化是发生在输出值、输入特征还是二者之间的关系中。我们提出了FIELDING,一个能够以低开销处理多种类型数据漂移的CFL框架。FIELDING在单个客户端处检测漂移,并执行选择性重新聚类以平衡聚类质量与模型性能,同时对恶意客户端和不同程度的异构性保持鲁棒性。实验表明,与现有最先进的CFL方法相比,FIELDING将最终模型精度提高了1.9-5.9%,并以1.16倍至2.23倍的速度达到目标精度。