Training Deep Learning (DL) models require large, high-quality datasets, often assembled with data from different institutions. Federated Learning (FL) has been emerging as a method for privacy-preserving pooling of datasets employing collaborative training from different institutions by iteratively globally aggregating locally trained models. One critical performance challenge of FL is operating on datasets not independently and identically distributed (non-IID) among the federation participants. Even though this fragility cannot be eliminated, it can be debunked by a suitable optimization of two hyper-parameters: layer normalization methods and collaboration frequency selection. In this work, we benchmark five different normalization layers for training Neural Networks (NNs), two families of non-IID data skew, and two datasets. Results show that Batch Normalization, widely employed for centralized DL, is not the best choice for FL, whereas Group and Layer Normalization consistently outperform Batch Normalization. Similarly, frequent model aggregation decreases convergence speed and mode quality.
翻译:训练深度学习模型需要大量高质量的数据集,这些数据集通常由来自不同机构的数据汇集而成。联邦学习作为一种隐私保护的数据集池化方法,通过迭代式全局聚合本地训练的模型,实现不同机构间的协作训练。联邦学习面临的关键性能挑战之一,是其操作的数据集在联邦参与者之间并非独立同分布(non-IID)。尽管这种脆弱性无法消除,但可以通过优化两个超参数来缓解:层归一化方法和协作频率选择。本研究针对五种不同的神经网络归一化层、两类非独立同分布数据偏移以及两个数据集进行了基准测试。结果表明,在集中式深度学习中广泛使用的批量归一化并非联邦学习的最佳选择,而组归一化和层归一化始终优于批量归一化。同样,频繁的模型聚合会降低收敛速度与模型质量。