Federated clustering, an essential extension of centralized clustering for federated scenarios, enables multiple data-holding clients to collaboratively group data while keeping their data locally. In centralized scenarios, clustering driven by representation learning has made significant advancements in handling high-dimensional complex data. However, the combination of federated clustering and representation learning remains underexplored. To bridge this, we first tailor a cluster-contrastive model for learning clustering-friendly representations. Then, we harness this model as the foundation for proposing a new federated clustering method, named cluster-contrastive federated clustering (CCFC). Benefiting from representation learning, the clustering performance of CCFC even double those of the best baseline methods in some cases. Compared to the most related baseline, the benefit results in substantial NMI score improvements of up to 0.4155 on the most conspicuous case. Moreover, CCFC also shows superior performance in handling device failures from a practical viewpoint.
翻译:联邦聚类作为集中式聚类在联邦场景中的重要扩展,使得多个持有数据的客户端能够在本地保留数据的同时协作完成数据分组。在集中式场景中,基于表征学习的聚类方法在处理高维复杂数据方面已取得显著进展。然而,联邦聚类与表征学习的结合仍待深入探究。为弥合这一鸿沟,我们首先设计了一种面向聚类友好型表征的聚类对比模型。在此基础上,我们提出了一种名为“聚类对比联邦聚类”(CCFC)的新型联邦聚类方法。得益于表征学习,CCFC的聚类性能在部分场景中甚至达到最优基线方法的两倍。与最相关基线方法相比,在最显著案例中,该方法的归一化互信息(NMI)指标提升最高达0.4155。此外,从实际应用角度来看,CCFC在处理设备故障方面也展现出更优性能。