Federated Learning (FL) is a decentralized learning paradigm, in which multiple clients collaboratively train deep learning models without centralizing their local data, and hence preserve data privacy. Real-world applications usually involve a distribution shift across the datasets of the different clients, which hurts the generalization ability of the clients to unseen samples from their respective data distributions. In this work, we address the recently proposed feature shift problem where the clients have different feature distributions, while the label distribution is the same. We propose Federated Representation Augmentation (FRAug) to tackle this practical and challenging problem. Our approach generates synthetic client-specific samples in the embedding space to augment the usually small client datasets. For that, we train a shared generative model to fuse the clients knowledge learned from their different feature distributions. This generator synthesizes client-agnostic embeddings, which are then locally transformed into client-specific embeddings by Representation Transformation Networks (RTNets). By transferring knowledge across the clients, the generated embeddings act as a regularizer for the client models and reduce overfitting to the local original datasets, hence improving generalization. Our empirical evaluation on public benchmarks and a real-world medical dataset demonstrates the effectiveness of the proposed method, which substantially outperforms the current state-of-the-art FL methods for non-IID features, including PartialFed and FedBN.
翻译:联邦学习是一种去中心化的学习范式,其中多个客户端协作训练深度学习模型,无需集中其本地数据,从而保护数据隐私。实际应用通常涉及不同客户端数据集之间的分布偏移,这会损害客户端对其各自数据分布中未见样本的泛化能力。本文针对近期提出的特征偏移问题(即客户端具有不同特征分布,但标签分布相同)展开研究。我们提出联邦表征增强方法来解决这一实用且具有挑战性的问题。该方法在嵌入空间中生成合成客户端特定样本,以扩充通常较小的客户端数据集。为此,我们训练共享生成模型来融合从不同特征分布中学到的客户端知识。该生成器合成与客户端无关的嵌入,随后通过表征转换网络本地转换为客户端特定的嵌入。通过跨客户端传递知识,生成的嵌入作为客户端模型的正则化项,减少对本地原始数据集的过拟合,从而提升泛化能力。我们在公开基准数据集及真实医疗数据集上的实证评估表明,所提方法显著优于当前针对非独立同分布特征的最先进的联邦学习方法(包括PartialFed和FedBN),验证了其有效性。