FedCCL: Federated Dual-Clustered Feature Contrast Under Domain Heterogeneity

Federated learning (FL) facilitates a privacy-preserving neural network training paradigm through collaboration between edge clients and a central server. One significant challenge is that the distributed data is not independently and identically distributed (non-IID), typically including both intra-domain and inter-domain heterogeneity. However, recent research is limited to simply using averaged signals as a form of regularization and only focusing on one aspect of these non-IID challenges. Given these limitations, this paper clarifies these two non-IID challenges and attempts to introduce cluster representation to address them from both local and global perspectives. Specifically, we propose a dual-clustered feature contrast-based FL framework with dual focuses. First, we employ clustering on the local representations of each client, aiming to capture intra-class information based on these local clusters at a high level of granularity. Then, we facilitate cross-client knowledge sharing by pulling the local representation closer to clusters shared by clients with similar semantics while pushing them away from clusters with dissimilar semantics. Second, since the sizes of local clusters belonging to the same class may differ for each client, we further utilize clustering on the global side and conduct averaging to create a consistent global signal for guiding each local training in a contrastive manner. Experimental results on multiple datasets demonstrate that our proposal achieves comparable or superior performance gain under intra-domain and inter-domain heterogeneity.

翻译：联邦学习（FL）通过边缘客户端与中央服务器之间的协作，实现了一种隐私保护的神经网络训练范式。其面临的一个重大挑战在于分布式数据并非独立同分布（non-IID），通常同时包含域内与域间异构性。然而，现有研究大多局限于简单地使用平均信号作为一种正则化形式，且仅关注这些非独立同分布挑战的某一个方面。鉴于这些局限性，本文明确区分了这两类非独立同分布挑战，并尝试引入聚类表征，从局部与全局双重视角应对它们。具体而言，我们提出了一种基于双聚类特征对比的联邦学习框架，该框架具有双重聚焦点。首先，我们在每个客户端的局部表征上执行聚类，旨在以细粒度捕捉基于这些局部聚类的类内信息。随后，我们通过将局部表征拉近至具有相似语义的客户端所共享的聚类，同时推远其与语义相异聚类之间的距离，以促进跨客户端知识共享。其次，由于属于同一类别的局部聚类大小可能因客户端而异，我们进一步在全局侧利用聚类并进行平均操作，以生成一致的全局信号，从而以对比方式指导每个局部训练。在多个数据集上的实验结果表明，我们的方法在域内与域间异构性条件下均能取得可比或更优的性能提升。