Federated learning, which allows multiple client devices in a network to jointly train a machine learning model without direct exposure of clients' data, is an emerging distributed learning technique due to its nature of privacy preservation. However, it has been found that models trained with federated learning usually have worse performance than their counterparts trained in the standard centralized learning mode, especially when the training data is imbalanced. In the context of federated learning, data imbalance may occur either locally one one client device, or globally across many devices. The complexity of different types of data imbalance has posed challenges to the development of federated learning technique, especially considering the need of relieving data imbalance issue and preserving data privacy at the same time. Therefore, in the literature, many attempts have been made to handle class imbalance in federated learning. In this paper, we present a detailed review of recent advancements along this line. We first introduce various types of class imbalance in federated learning, after which we review existing methods for estimating the extent of class imbalance without the need of knowing the actual data to preserve data privacy. After that, we discuss existing methods for handling class imbalance in FL, where the advantages and disadvantages of the these approaches are discussed. We also summarize common evaluation metrics for class imbalanced tasks, and point out potential future directions.
翻译:联邦学习作为一种新兴的分布式学习技术,允许多个网络中的客户端设备在不直接暴露客户端数据的情况下共同训练机器学习模型,因其隐私保护特性而备受关注。然而,研究发现联邦学习训练的模型通常性能劣于标准集中式学习模式下训练的对应模型,尤其是在训练数据存在不平衡的情况下。在联邦学习场景中,数据不平衡可能发生在单一客户端设备本地,也可能跨越多个设备全局出现。不同类型数据不平衡的复杂性给联邦学习技术的发展带来了挑战,特别是在需要同时缓解数据不平衡问题和保护数据隐私的背景下。因此,学术界已开展了大量尝试来处理联邦学习中的类别不平衡问题。本文对该领域的最新进展进行了详细综述。我们首先界定了联邦学习中各类类别不平衡现象,继而回顾了无需知晓实际数据即可评估不平衡程度的现有方法(以保护数据隐私)。随后,我们讨论了联邦学习中处理类别不平衡的现有方法,并分析了各类方法的优缺点。此外,我们总结了类别不平衡任务的常用评估指标,并指出了潜在的未来研究方向。