Federated learning enables multiple decentralized clients to learn collaboratively without sharing the local training data. However, the expensive annotation cost to acquire data labels on local clients remains an obstacle in utilizing local data. In this paper, we propose a federated active learning paradigm to efficiently learn a global model with limited annotation budget while protecting data privacy in a decentralized learning way. The main challenge faced by federated active learning is the mismatch between the active sampling goal of the global model on the server and that of the asynchronous local clients. This becomes even more significant when data is distributed non-IID across local clients. To address the aforementioned challenge, we propose Knowledge-Aware Federated Active Learning (KAFAL), which consists of Knowledge-Specialized Active Sampling (KSAS) and Knowledge-Compensatory Federated Update (KCFU). KSAS is a novel active sampling method tailored for the federated active learning problem. It deals with the mismatch challenge by sampling actively based on the discrepancies between local and global models. KSAS intensifies specialized knowledge in local clients, ensuring the sampled data to be informative for both the local clients and the global model. KCFU, in the meantime, deals with the client heterogeneity caused by limited data and non-IID data distributions. It compensates for each client's ability in weak classes by the assistance of the global model. Extensive experiments and analyses are conducted to show the superiority of KSAS over the state-of-the-art active learning methods and the efficiency of KCFU under the federated active learning framework.
翻译:联邦学习使多个分散的客户端能够在无需共享本地训练数据的情况下协作学习。然而,本地客户端上获取数据标签所需的高额标注成本仍是利用本地数据的障碍。本文提出了一种联邦主动学习范式,以在保护数据隐私的分散学习方式下,利用有限的标注预算高效学习全局模型。联邦主动学习面临的主要挑战是服务器上全局模型的主动采样目标与异步本地客户端的主动采样目标之间的不匹配。当数据在各本地客户端上呈非独立同分布(Non-IID)分布时,这一问题尤为显著。为应对上述挑战,我们提出了知识感知的联邦主动学习(KAFAL),其由知识专化主动采样(KSAS)和知识补偿联邦更新(KCFU)组成。KSAS是一种专为联邦主动学习问题设计的新型主动采样方法,通过基于本地模型与全局模型之间的差异进行主动采样来解决不匹配挑战。KSAS强化了本地客户端中的专化知识,确保采样数据对本地客户端和全局模型均具有信息性。同时,KCFU处理了由有限数据和非独立同分布数据分布导致的客户端异质性,通过全局模型的辅助补偿每个客户端在弱类别上的能力。大量实验和分析表明,KSAS相较于最先进的主动学习方法具有优越性,且KCFU在联邦主动学习框架下具有高效性。