Federated Learning (FL) has surged in prominence due to its capability of collaborative model training without direct data sharing. However, the vast disparity in local data distributions among clients, often termed the non-Independent Identically Distributed (non-IID) challenge, poses a significant hurdle to FL's generalization efficacy. The scenario becomes even more complex when not all clients participate in the training process, a common occurrence due to unstable network connections or limited computational capacities. This can greatly complicate the assessment of the trained models' generalization abilities. While a plethora of recent studies has centered on the generalization gap pertaining to unseen data from participating clients with diverse distributions, the divergence between the training distributions of participating clients and the testing distributions of non-participating ones has been largely overlooked. In response, our paper unveils an information-theoretic generalization framework for FL. Specifically, it quantifies generalization errors by evaluating the information entropy of local distributions and discerning discrepancies across these distributions. Inspired by our deduced generalization bounds, we introduce a weighted aggregation approach and a duo of client selection strategies. These innovations aim to bolster FL's generalization prowess by encompassing a more varied set of client data distributions. Our extensive empirical evaluations reaffirm the potency of our proposed methods, aligning seamlessly with our theoretical construct.
翻译:联邦学习(FL)因其无需直接共享数据即可协同训练模型的特性而备受关注。然而,客户端本地数据分布的巨大差异(通常称为非独立同分布挑战)严重制约了FL的泛化性能。当部分客户端因网络连接不稳定或计算能力有限而无法参与训练过程时,该问题将更加复杂,并显著增加评估训练模型泛化能力的难度。尽管近期大量研究聚焦于参与客户端在多样化分布下对未观测数据的泛化差距,但参与客户端训练分布与非参与客户端测试分布之间的差异却被普遍忽视。为此,本文提出了一种基于信息论的联邦学习泛化框架:通过评估本地分布的信息熵并量化这些分布间的差异来度量泛化误差。基于推导出的泛化界,我们进一步提出加权聚合方法及两种客户端选择策略。这些创新通过涵盖更广泛的客户端数据分布来增强联邦学习的泛化能力。大量实验验证了所提方法的有效性,其性能结果与理论框架高度吻合。