As people pay more and more attention to privacy protection, Federated Learning (FL), as a promising distributed machine learning paradigm, is receiving more and more attention. However, due to the biased distribution of data on devices in real life, federated learning has lower classification accuracy than traditional machine learning in Non-IID scenarios. Although there are many optimization algorithms, the local model aggregation in the parameter server is still relatively traditional. In this paper, a new algorithm FedPDC is proposed to optimize the aggregation mode of local models and the loss function of local training by using the shared data sets in some industries. In many benchmark experiments, FedPDC can effectively improve the accuracy of the global model in the case of extremely unbalanced data distribution, while ensuring the privacy of the client data. At the same time, the accuracy improvement of FedPDC does not bring additional communication costs.
翻译:随着人们对隐私保护的日益重视,联邦学习作为一种有前景的分布式机器学习范式正受到越来越多的关注。然而,由于真实场景中设备端数据分布存在偏差,联邦学习在非独立同分布场景下的分类准确率低于传统机器学习。尽管已有多种优化算法,但参数服务器中的本地模型聚合方式仍相对传统。本文提出了一种名为FedPDC的新型算法,通过利用某些行业中的共享数据集来优化本地模型的聚合方式及本地训练的损失函数。在多项基准实验中,FedPDC能在保证客户端数据隐私的前提下,有效提升极端数据分布不均衡情况下全局模型的准确率。同时,FedPDC的准确率提升并未带来额外的通信开销。