Federated Learning (FL) is a paradigm for training machine learning (ML) models in collaborative settings while preserving participants' privacy by keeping raw data local. A key requirement for the use of FL in production is reliability, as insufficient reliability can compromise the validity, stability, and reproducibility of learning outcomes. FL inherently operates as a distributed system and is therefore susceptible to crash failures, network partitioning, and other fault scenarios. Despite this, the impact of such failures on FL outcomes has not yet been studied systematically. In this paper, we address this gap by investigating the impact of missing participants in FL. To this end, we conduct extensive experiments on image, tabular, and time-series data and analyze how the absence of participants affects model performance, taking into account influencing factors such as data skewness, different availability patterns, and model architectures. Furthermore, we examine scenario-specific aspects, including the utility of the global model for missing participants. Our experiments provide detailed insights into the effects of various influencing factors. In particular, we show that data skewness has a strong impact, often leading to overly optimistic model evaluations and, in some cases, even altering the effects of other influencing factors.
翻译:联邦学习(FL)是一种在协作环境中训练机器学习(ML)模型的范式,通过将原始数据保留在本地来保护参与者隐私。FL在生产环境中使用的关键要求是可靠性,因为可靠性不足可能危及学习结果的有效性、稳定性和可重复性。FL天然以分布式系统方式运行,因此容易受到崩溃故障、网络分区及其他故障场景的影响。尽管如此,此类故障对FL结果的影响尚未得到系统研究。本文通过探究FL中参与者缺失的影响来弥补这一空白。为此,我们针对图像、表格和时间序列数据开展了大量实验,分析了参与者缺失如何影响模型性能,并考虑了数据偏斜度、不同可用性模式和模型架构等影响因素。此外,我们进一步考察了场景特定方面,包括全局模型对缺失参与者的效用。我们的实验为各种影响因素的作用提供了详细见解。特别地,我们发现数据偏斜度具有显著影响,常导致过于乐观的模型评估,甚至在某些情况下改变了其他影响因素的作用。