Federated Learning (FL) enables local devices to collaboratively learn a shared predictive model by only periodically sharing model parameters with a central aggregator. However, FL can be disadvantaged by statistical heterogeneity produced by the diversity in each local devices data distribution, which creates different levels of Independent and Identically Distributed (IID) data. Furthermore, this can be more complex when optimising different combinations of FL parameters and choosing optimal aggregation. In this paper, we present an empirical analysis of different FL training parameters and aggregators over various levels of statistical heterogeneity on three datasets. We propose a systematic data partition strategy to simulate different levels of statistical heterogeneity and a metric to measure the level of IID. Additionally, we empirically identify the best FL model and key parameters for datasets of different characteristics. On the basis of these, we present recommended guidelines for FL parameters and aggregators to optimise model performance under different levels of IID and with different datasets
翻译:联邦学习(FL)允许本地设备仅通过定期与中央聚合器共享模型参数,协作学习一个共享的预测模型。然而,FL可能因统计异质性而处于不利地位,这种异质性源于各本地设备数据分布的多样性,从而产生了不同程度的独立同分布(IID)数据。此外,在优化不同FL参数组合并选择最优聚合方法时,情况可能更为复杂。本文对三种数据集上不同统计异质性水平下的多种FL训练参数与聚合器进行了实证分析。我们提出了一种系统化的数据划分策略来模拟不同程度的统计异质性,并提出了一种衡量IID水平的指标。此外,我们通过实证确定了针对不同特性数据集的最佳FL模型与关键参数。在此基础上,我们提出了针对FL参数与聚合器的推荐指南,以在不同IID水平及不同数据集下优化模型性能。