One of the most challenging issues in federated learning is that the data is often not independent and identically distributed (nonIID). Clients are expected to contribute the same type of data and drawn from one global distribution. However, data are often collected in different ways from different resources. Thus, the data distributions among clients might be different from the underlying global distribution. This creates a weight divergence issue and reduces federated learning performance. This work focuses on improving federated learning performance for skewed data distribution across clients. The main idea is to adjust the client distribution closer to the global distribution using sample weights. Thus, the machine learning model converges faster with higher accuracy. We start from the fundamental concept of empirical risk minimization and theoretically derive a solution for adjusting the distribution skewness using sample weights. To determine sample weights, we implicitly exchange density information by leveraging a neural network-based density estimation model, MADE. The clients data distribution can then be adjusted without exposing their raw data. Our experiment results on three real-world datasets show that the proposed method not only improves federated learning accuracy but also significantly reduces communication costs compared to the other experimental methods.
翻译:联邦学习中最具挑战性的问题之一在于数据通常不满足独立同分布(non-IID)条件。客户端应贡献同一类型的数据,且这些数据应源自全局分布。然而,数据通常以不同方式从不同来源收集,导致各客户端的数据分布可能与底层全局分布存在差异,进而引发权重发散问题,降低联邦学习性能。本研究聚焦于提升数据分布偏斜场景下跨客户端联邦学习的性能,核心思想是通过样本权重调整客户端分布,使其更接近全局分布,从而使机器学习模型以更高精度更快收敛。我们从经验风险最小化的基本概念出发,从理论上推导出利用样本权重调整分布偏斜度的解决方案。为确定样本权重,我们通过基于神经网络的密度估计模型MADE隐式交换密度信息,从而在不暴露原始数据的情况下调整客户端数据分布。在三个真实世界数据集上的实验结果表明,相较于其他实验方法,所提方法不仅提升了联邦学习精度,还显著降低了通信成本。