As Federated Learning (FL) expands, the challenge of non-independent and identically distributed (non-IID) data becomes critical. Clustered Federated Learning (CFL) addresses this by training multiple specialized models, each representing a group of clients with similar data distributions. However, the term ''CFL'' has increasingly been applied to operational strategies unrelated to data heterogeneity, creating significant ambiguity. This survey provides a systematic review of the CFL literature and introduces a principled taxonomy that classifies algorithms into Server-side, Client-side, and Metadata-based approaches. Our analysis reveals a distinct dichotomy: while theoretical research prioritizes privacy-preserving Server/Client-side methods, real-world applications in IoT, Mobility, and Energy overwhelmingly favor Metadata-based efficiency. Furthermore, we explicitly distinguish ''Core CFL'' (grouping clients for non-IID data) from ''Clustered X FL'' (operational variants for system heterogeneity). Finally, we outline lessons learned and future directions to bridge the gap between theoretical privacy and practical efficiency.
翻译:随着联邦学习(FL)的扩展,非独立同分布(non-IID)数据的挑战变得至关重要。聚类联邦学习(CFL)通过训练多个专用模型来解决这一问题,每个模型代表一组具有相似数据分布的客户端。然而,“CFL”这一术语越来越多地被应用于与数据异质性无关的操作策略,造成了显著的歧义。本综述对CFL文献进行了系统性回顾,并引入了一个原则性的分类法,将算法分为服务器端、客户端和基于元数据的方法。我们的分析揭示了一个明显的二分现象:虽然理论研究优先考虑保护隐私的服务器/客户端方法,但在物联网、移动性和能源等领域的实际应用中,基于元数据的效率方法占据了绝对优势。此外,我们明确区分了“核心CFL”(为处理非IID数据而分组客户端)与“聚类X FL”(针对系统异质性的操作变体)。最后,我们总结了经验教训并展望了未来方向,以弥合理论隐私保护与实际效率之间的差距。