Federated learning (FL) has emerged as a privacy-preserving paradigm that trains neural networks on edge devices without collecting data at a central server. However, FL encounters an inherent challenge in dealing with non-independent and identically distributed (non-IID) data among devices. To address this challenge, this paper proposes a hard feature matching data synthesis (HFMDS) method to share auxiliary data besides local models. Specifically, synthetic data are generated by learning the essential class-relevant features of real samples and discarding the redundant features, which helps to effectively tackle the non-IID issue. For better privacy preservation, we propose a hard feature augmentation method to transfer real features towards the decision boundary, with which the synthetic data not only improve the model generalization but also erase the information of real features. By integrating the proposed HFMDS method with FL, we present a novel FL framework with data augmentation to relieve data heterogeneity. The theoretical analysis highlights the effectiveness of our proposed data synthesis method in solving the non-IID challenge. Simulation results further demonstrate that our proposed HFMDS-FL algorithm outperforms the baselines in terms of accuracy, privacy preservation, and computational cost on various benchmark datasets.
翻译:联邦学习(Federated Learning, FL)作为一种隐私保护范式,允许在边缘设备上训练神经网络而无需在中央服务器收集数据。然而,FL在应对设备间非独立同分布(non-IID)数据时面临固有挑战。针对这一问题,本文提出一种硬特征匹配数据合成(HFMDS)方法,在共享本地模型之外额外共享辅助数据。具体而言,通过学习真实样本中与类别相关的本质特征并舍弃冗余特征来生成合成数据,这有助于有效解决非独立同分布问题。为提升隐私保护能力,我们提出一种硬特征增强方法,将真实特征向决策边界迁移。采用该方法生成的合成数据不仅能改善模型泛化性能,还可消除真实特征的信息。通过将所提出的HFMDS方法与FL结合,我们构建了一种新型数据增强FL框架以缓解数据异质性。理论分析凸显了所提数据合成方法在解决非独立同分布挑战中的有效性。仿真结果进一步表明,在多个基准数据集上,所提出的HFMDS-FL算法在准确率、隐私保护及计算开销方面均优于基线方法。