Federated learning (FL) enables collaborative learning among decentralized clients while safeguarding the privacy of their local data. Existing studies on FL typically assume offline labeled data available at each client when the training starts. Nevertheless, the training data in practice often arrive at clients in a streaming fashion without ground-truth labels. Given the expensive annotation cost, it is critical to identify a subset of informative samples for labeling on clients. However, selecting samples locally while accommodating the global training objective presents a challenge unique to FL. In this work, we tackle this conundrum by framing the data querying process in FL as a collaborative decentralized decision-making problem and proposing an effective solution named LeaDQ, which leverages multi-agent reinforcement learning algorithms. In particular, under the implicit guidance from global information, LeaDQ effectively learns the local policies for distributed clients and steers them towards selecting samples that can enhance the global model's accuracy. Extensive simulations on image and text tasks show that LeaDQ advances the model performance in various FL scenarios, outperforming the benchmarking algorithms.
翻译:联邦学习(FL)使得去中心化的客户端能够在保护本地数据隐私的前提下进行协同学习。现有关于联邦学习的研究通常假设训练开始时每个客户端都拥有离线的标记数据。然而,实践中训练数据通常以流式方式到达客户端,且缺乏真实标记。考虑到高昂的标注成本,在客户端上识别出一个信息丰富的样本子集进行标注至关重要。然而,在适应全局训练目标的同时进行本地样本选择,是联邦学习特有的一个挑战。在本工作中,我们通过将联邦学习中的数据查询过程构建为一个协作式的去中心化决策问题来解决这一难题,并提出了一种名为LeaDQ的有效解决方案,该方案利用了多智能体强化学习算法。具体而言,在全局信息的隐式指导下,LeaDQ有效地学习分布式客户端的本地策略,并引导它们选择能够提升全局模型准确性的样本。在图像和文本任务上的大量仿真实验表明,LeaDQ在多种联邦学习场景中提升了模型性能,优于基准算法。