Federated Learning (FL) enables distributed learning across multiple clients without sharing raw data. When statistical heterogeneity across clients is severe, Clustered Federated Learning (CFL) can im-prove performance by grouping similar clients and training cluster-wise models. However, most CFL approaches rely on multiple communication rounds for cluster estimation and model updates, which limits their practicality under tight constraints on communication rounds. We propose Data Collaboration-based Clustered Federated Learning (DC-CFL), a single-round framework that completes both client clustering and cluster-wise learning, using only the information shared in DC analysis. DC-CFL quantifies inter-client similarity via total variation distance between label distributions, estimates clusters using hierarchical clustering, and performs cluster-wise learning via DC analysis. Experiments on multiple open datasets under representative non-IID conditions show that DC-CFL achieves accuracy comparable to multi-round baselines while requiring only one communication round. These results indicate that DC-CFL is a practical alternative for collaborative AI model development when multiple communication rounds are impractical. Our source code is publicly available at https://github.com/souta-suga/DC-CFL.
翻译:联邦学习(FL)能够在多个客户端间实现分布式学习,而无需共享原始数据。当客户端间的统计异质性严重时,聚类联邦学习(CFL)可通过分组相似客户端并训练聚类特定模型来提升性能。然而,大多数CFL方法依赖多轮通信进行聚类估计与模型更新,这在通信轮数严格受限的场景下限制了其实用性。我们提出基于数据协作的聚类联邦学习(DC-CFL),这是一个仅利用数据协作分析(DC分析)中共享的信息,即可同时完成客户端聚类与聚类特定学习的单轮框架。DC-CFL通过标签分布间的总变差距离量化客户端间相似性,采用层次聚类估计聚类结构,并通过DC分析执行聚类特定学习。在代表性非独立同分布条件下对多个公开数据集的实验表明,DC-CFL在仅需一轮通信的情况下,达到了与多轮基线方法相当的准确率。这些结果表明,当多轮通信不可行时,DC-CFL是协作式AI模型开发的一种实用替代方案。我们的源代码已公开于https://github.com/souta-suga/DC-CFL。