Federated Learning (FL) enables model development by leveraging data distributed across numerous edge devices without transferring local data to a central server. However, existing FL methods still face challenges when dealing with scarce and label-skewed data across devices, resulting in local model overfitting and drift, consequently hindering the performance of the global model. In response to these challenges, we propose a pioneering framework called FLea, incorporating the following key components: i) A global feature buffer that stores activation-target pairs shared from multiple clients to support local training. This design mitigates local model drift caused by the absence of certain classes; ii) A feature augmentation approach based on local and global activation mix-ups for local training. This strategy enlarges the training samples, thereby reducing the risk of local overfitting; iii) An obfuscation method to minimize the correlation between intermediate activations and the source data, enhancing the privacy of shared features. To verify the superiority of FLea, we conduct extensive experiments using a wide range of data modalities, simulating different levels of local data scarcity and label skew. The results demonstrate that FLea consistently outperforms state-of-the-art FL counterparts (among 13 of the experimented 18 settings, the improvement is over 5% while concurrently mitigating the privacy vulnerabilities associated with shared features. Code is available at https://github.com/XTxiatong/FLea.git.
翻译:联邦学习(FL)通过利用分布在众多边缘设备上的数据进行模型开发,而无需将本地数据传输至中央服务器。然而,现有联邦学习方法在处理设备间稀缺且标签偏斜的数据时仍面临挑战,导致本地模型过拟合与漂移,进而阻碍全局模型的性能。针对这些挑战,我们提出了一个开创性框架FLea,其包含以下关键组成部分:i) 一个全局特征缓冲区,用于存储从多个客户端共享的激活-目标对以支持本地训练。该设计缓解了因某些类别缺失而导致的本地模型漂移;ii) 一种基于本地与全局激活混合的特征增强方法,用于本地训练。该策略扩大了训练样本,从而降低了本地过拟合的风险;iii) 一种混淆方法,以最小化中间激活与源数据之间的关联性,增强共享特征的隐私性。为验证FLea的优越性,我们使用多种数据模态进行了广泛实验,模拟了不同程度的本地数据稀缺与标签偏斜。结果表明,FLea始终优于最先进的联邦学习方法(在实验的18种设置中,有13种设置的性能提升超过5%),同时减轻了与共享特征相关的隐私漏洞。代码可在 https://github.com/XTxiatong/FLea.git 获取。