Although federated learning has made awe-inspiring advances, most studies have assumed that the client's data are fully labeled. However, in a real-world scenario, every client may have a significant amount of unlabeled instances. Among the various approaches to utilizing unlabeled data, a federated active learning framework has emerged as a promising solution. In the decentralized setting, there are two types of available query selector models, namely 'global' and 'local-only' models, but little literature discusses their performance dominance and its causes. In this work, we first demonstrate that the superiority of two selector models depends on the global and local inter-class diversity. Furthermore, we observe that the global and local-only models are the keys to resolving the imbalance of each side. Based on our findings, we propose LoGo, a FAL sampling strategy robust to varying local heterogeneity levels and global imbalance ratio, that integrates both models by two steps of active selection scheme. LoGo consistently outperforms six active learning strategies in the total number of 38 experimental settings.
翻译:尽管联邦学习取得了令人瞩目的进展,但大多数研究假设客户端的数据是完全标注的。然而,在现实场景中,每个客户端可能拥有大量未标注样本。在利用未标注数据的多种方法中,联邦主动学习框架已成为一种有前景的解决方案。在分散式设定下,存在两种可用的查询选择器模型,即“全局”模型和“仅本地”模型,但鲜有文献探讨其性能优势及其成因。在这项工作中,我们首先证明这两种选择器模型的优越性取决于全局和本地的类别间差异性。此外,我们观察到全局模型和仅本地模型是解决各自侧不平衡性的关键。基于我们的发现,我们提出LoGo,一种对局部异质性程度和全局不平衡比例具有鲁棒性的联邦主动学习采样策略,该策略通过两步主动选择方案整合了这两种模型。在总计38组实验设定中,LoGo始终优于六种主动学习策略。