Federated learning provides a promising privacy-preserving way for utilizing large-scale private edge data from massive Internet-of-Things (IoT) devices. While existing research extensively studied optimizing the learning process, computing efficiency, and communication overhead, one important and often overlooked aspect is that participants contribute predictive knowledge from their data, impacting the quality of the federated models learned. While FedAvg treats each client equally and assigns weight solely based on the number of samples, the diversity of samples on each client could greatly affect the local update performance and the final aggregated model. In this paper, we propose a novel approach to address this issue by introducing a Weighted Averaging (WeiAvg) framework that emphasizes updates from high-diversity clients and diminishes the influence of those from low-diversity clients. Specifically, we introduced a projection-based approximation method to estimate the diversity of client data, instead of the computation of an entropy. We use the approximation because the locally computed entropy may not be transmitted due to excess privacy risk. Extensive experimental results show that WeiAvg converges faster and achieves higher accuracy than the original FedAvg algorithm and FedProx.
翻译:联邦学习为利用海量物联网设备上的大规模私有边缘数据提供了一种有前景的隐私保护方式。现有研究广泛探讨了优化学习过程、计算效率及通信开销,但一个常被忽视的重要方面在于:参与者从其数据中贡献预测性知识,从而影响所学习联邦模型的质量。尽管FedAvg算法平等对待每个客户端,并仅基于样本数量分配权重,但各客户端上样本的多样性可能显著影响本地更新性能及最终聚合模型。本文提出一种新颖方法解决该问题:引入加权平均框架,该框架强调来自高多样性客户端的更新,并削弱低多样性客户端的影响。具体而言,我们提出基于投影的近似方法来估计客户端数据的多样性,而非通过熵值计算。采用近似方法的原因在于,本地计算的熵可能因隐私风险过高而无法传输。大量实验结果表明,与原始FedAvg算法及FedProx相比,WeiAvg收敛速度更快且准确率更高。