Spectral clustering has emerged as one of the most effective clustering algorithms due to its superior performance. However, most existing models are designed for centralized settings, rendering them inapplicable in modern decentralized environments. Moreover, current federated learning approaches often suffer from poor generalization performance due to reliance on unreliable pseudo-labels, and fail to capture the latent correlations amongst heterogeneous clients. To tackle these limitations, this paper proposes a novel framework named Federated Multi-Task Clustering (i.e.,FMTC), which intends to learn personalized clustering models for heterogeneous clients while collaboratively leveraging their shared underlying structure in a privacy-preserving manner. More specifically, the FMTC framework is composed of two main components: client-side personalized clustering module, which learns a parameterized mapping model to support robust out-of-sample inference, bypassing the need for unreliable pseudo-labels; and server-side tensorial correlation module, which explicitly captures the shared knowledge across all clients. This is achieved by organizing all client models into a unified tensor and applying a low-rank regularization to discover their common subspace. To solve this joint optimization problem, we derive an efficient, privacy-preserving distributed algorithm based on the Alternating Direction Method of Multipliers, which decomposes the global problem into parallel local updates on clients and an aggregation step on the server. To the end, several extensive experiments on multiple real-world datasets demonstrate that our proposed FMTC framework significantly outperforms various baseline and state-of-the-art federated clustering algorithms.
翻译:谱聚类因其卓越的性能已成为最有效的聚类算法之一。然而,现有模型大多针对集中式场景设计,使其难以适用于现代分布式环境。此外,当前联邦学习方法常因依赖不可靠伪标签导致泛化性能较差,且未能捕捉异构客户端间的潜在关联。为解决上述局限,本文提出一种名为联邦多任务聚类(即FMTC)的新型框架,旨在以隐私保护方式协同利用异构客户端共享的底层结构的同时,为其学习个性化聚类模型。具体而言,FMTC框架包含两大核心组件:客户端侧个性化聚类模块,通过学习参数化映射模型支持鲁棒的样本外推理,从而规避对不可靠伪标签的依赖;以及服务器侧张量关联模块,显式捕捉所有客户端的共享知识。该目标通过将所有客户端模型组织为统一张量并施加低秩正则化以发现其公共子空间来实现。为求解此联合优化问题,我们基于交替方向乘子法推导出一种高效且隐私保护的分布式算法,将全局问题分解为客户端并行局部更新与服务器端聚合步骤。最后,在多个真实数据集上的大量实验表明,所提出的FMTC框架显著优于各类基线及最先进的联邦聚类算法。