Federated Learning (FL) has surged in prominence due to its capability of collaborative model training without direct data sharing. However, the vast disparity in local data distributions among clients, often termed the Non-Independent Identically Distributed (Non-IID) challenge, poses a significant hurdle to FL's generalization efficacy. The scenario becomes even more complex when not all clients participate in the training process, a common occurrence due to unstable network connections or limited computational capacities. This can greatly complicate the assessment of the trained models' generalization abilities. While a plethora of recent studies has centered on the generalization gap pertaining to unseen data from participating clients with diverse distributions, the distinction between the training distributions of participating clients and the testing distributions of non-participating ones has been largely overlooked. In response, our paper unveils an information-theoretic generalization framework for FL. Specifically, it quantifies generalization errors by evaluating the information entropy of local distributions and discerning discrepancies across these distributions. Inspired by our deduced generalization bounds, we introduce a weighted aggregation approach and a duo of client selection strategies. These innovations are designed to strengthen FL's ability to generalize and thus ensure that trained models perform better on non-participating clients by incorporating a more diverse range of client data distributions. Our extensive empirical evaluations reaffirm the potency of our proposed methods, aligning seamlessly with our theoretical construct.
翻译:联邦学习(FL)因其无需直接共享数据即可实现协作模型训练的能力而日益受到关注。然而,客户端之间本地数据分布的巨大差异(通常称为非独立同分布挑战)对FL的泛化效能构成了重大障碍。当并非所有客户端都参与训练过程时(这种情况因网络连接不稳定或计算能力有限而常见),情况变得更加复杂,这极大地增加了评估训练模型泛化能力的难度。尽管近期大量研究聚焦于来自具有不同分布参与客户的未见数据的泛化差距,但参与客户训练分布与非参与客户测试分布之间的差异却在很大程度上被忽视。为此,本文提出了一种用于FL的信息论泛化框架。具体而言,该框架通过评估本地分布的信息熵并识别这些分布之间的差异来量化泛化误差。受我们所推导泛化边界的启发,我们引入了一种加权聚合方法和两种客户端选择策略。这些创新旨在通过整合更多样化的客户端数据分布来增强FL的泛化能力,从而确保训练模型在非参与客户上表现更优。我们广泛的实证评估再次证实了所提方法的有效性,并与我们的理论构建完美契合。