In mobile and IoT systems, Federated Learning (FL) is increasingly important for effectively using data while maintaining user privacy. One key challenge in FL is managing statistical heterogeneity, such as non-i.i.d. data, arising from numerous clients and diverse data sources. This requires strategic cooperation, often with clients having similar characteristics. However, we are interested in a fundamental question: does achieving optimal cooperation necessarily entail cooperating with the most similar clients? Typically, significant model performance improvements are often realized not by partnering with the most similar models, but through leveraging complementary data. Our theoretical and empirical analyses suggest that optimal cooperation is achieved by enhancing complementarity in feature distribution while restricting the disparity in the correlation between features and targets. Accordingly, we introduce a novel framework, \texttt{FedSaC}, which balances similarity and complementarity in FL cooperation. Our framework aims to approximate an optimal cooperation network for each client by optimizing a weighted sum of model similarity and feature complementarity. The strength of \texttt{FedSaC} lies in its adaptability to various levels of data heterogeneity and multimodal scenarios. Our comprehensive unimodal and multimodal experiments demonstrate that \texttt{FedSaC} markedly surpasses other state-of-the-art FL methods.
翻译:在移动和物联网系统中,联邦学习(FL)在有效利用数据的同时维护用户隐私方面日益重要。FL面临的一个关键挑战是管理统计异质性(如非独立同分布数据),这源于大量客户端和多样化的数据源。这需要战略性协作,通常涉及具有相似特征的客户端。然而,我们关注一个根本性问题:实现最优协作是否必然意味着与最相似的客户端合作?通常,模型性能的显著提升并非通过与最相似模型合作实现,而是借助互补性数据。我们的理论和实证分析表明,最优协作需在增强特征分布互补性的同时,限制特征与目标间相关性的差异。据此,我们提出了一种新型框架\texttt{FedSaC},该框架在FL协作中平衡了相似性与互补性。其核心在于通过优化模型相似性与特征互补性的加权和,为每个客户端近似构建最优协作网络。\texttt{FedSaC}的优势在于能自适应不同层级的数据异质性与多模态场景。我们通过全面的单模态与多模态实验证明,\texttt{FedSaC}显著超越了其他先进的联邦学习方法。