Federated learning is a distributed paradigm that allows multiple parties to collaboratively train deep models without exchanging the raw data. However, the data distribution among clients is naturally non-i.i.d., which leads to severe degradation of the learnt model. The primary goal of this paper is to develop a robust federated learning algorithm to address feature shift in clients' samples, which can be caused by various factors, e.g., acquisition differences in medical imaging. To reach this goal, we propose FedFA to tackle federated learning from a distinct perspective of federated feature augmentation. FedFA is based on a major insight that each client's data distribution can be characterized by statistics (i.e., mean and standard deviation) of latent features; and it is likely to manipulate these local statistics globally, i.e., based on information in the entire federation, to let clients have a better sense of the underlying distribution and therefore alleviate local data bias. Based on this insight, we propose to augment each local feature statistic probabilistically based on a normal distribution, whose mean is the original statistic and variance quantifies the augmentation scope. Key to our approach is the determination of a meaningful Gaussian variance, which is accomplished by taking into account not only biased data of each individual client, but also underlying feature statistics characterized by all participating clients. We offer both theoretical and empirical justifications to verify the effectiveness of FedFA. Our code is available at https://github.com/tfzhou/FedFA.
翻译:联邦学习是一种分布式学习范式,允许多个参与方在不交换原始数据的情况下协同训练深度模型。然而,客户端之间的数据分布天然非独立同分布,导致学习模型性能严重下降。本文的主要目标是开发一种鲁棒的联邦学习算法,以解决客户端样本中的特征偏移问题——该问题可能由多种因素引起,例如医学影像采集差异。为实现这一目标,我们从联邦特征增强这一独特视角出发,提出FedFA算法。FedFA的核心洞察在于:每个客户端的数据分布可通过潜在特征的统计量(即均值和标准差)来表征;且有可能在全局层面(即基于整个联邦中的信息)操控这些局部统计量,使客户端更好地感知全局分布,从而缓解局部数据偏差。基于这一洞察,我们提出以正态分布为基础对每个局部特征统计量进行概率性增强,其中均值取原始统计量,方差表征增强范围。该方法的关键在于确定有意义的方差项——这需要综合考虑两方面信息:既包括每个客户端自身的偏态数据,也包括所有参与客户端所表征的潜在特征统计量。我们提供了理论和实验双重论证,以验证FedFA的有效性。代码已开源至https://github.com/tfzhou/FedFA。