Local or Global: Selective Knowledge Assimilation for Federated Learning with Limited Labels

Many existing FL methods assume clients with fully-labeled data, while in realistic settings, clients have limited labels due to the expensive and laborious process of labeling. Limited labeled local data of the clients often leads to their local model having poor generalization abilities to their larger unlabeled local data, such as having class-distribution mismatch with the unlabeled data. As a result, clients may instead look to benefit from the global model trained across clients to leverage their unlabeled data, but this also becomes difficult due to data heterogeneity across clients. In our work, we propose FedLabel where clients selectively choose the local or global model to pseudo-label their unlabeled data depending on which is more of an expert of the data. We further utilize both the local and global models' knowledge via global-local consistency regularization which minimizes the divergence between the two models' outputs when they have identical pseudo-labels for the unlabeled data. Unlike other semi-supervised FL baselines, our method does not require additional experts other than the local or global model, nor require additional parameters to be communicated. We also do not assume any server-labeled data or fully labeled clients. For both cross-device and cross-silo settings, we show that FedLabel outperforms other semi-supervised FL baselines by $8$-$24\%$, and even outperforms standard fully supervised FL baselines ($100\%$ labeled data) with only $5$-$20\%$ of labeled data.

翻译：许多现有的联邦学习方法假设客户端拥有完全标记的数据，而在现实场景中，由于标记过程昂贵且费力，客户端的标签十分有限。客户端有限的本地标记数据往往导致其本地模型对更大规模的未标记本地数据泛化能力较差，例如与未标记数据存在类别分布不匹配的问题。因此，客户端可能转而利用在客户端之间训练的全局模型来充分利用其未标记数据，但由于客户端之间的数据异质性，这同样变得困难。在我们的工作中，我们提出了FedLabel方法，客户端根据哪个模型更擅长处理特定数据，选择性地使用本地或全局模型为其未标记数据生成伪标签。我们进一步通过全局-局部一致性正则化来利用本地和全局模型的知识，该正则化在两者对未标记数据生成相同伪标签时，最小化两个模型输出之间的差异。与其他半监督联邦学习基线方法不同，我们的方法除了本地或全局模型外不需要额外的专家模型，也不需要通信额外的参数。同时，我们不假设存在服务器端标记数据或完全标记的客户端。在跨设备和跨组织两种场景下，我们证明FedLabel相比其他半监督联邦学习基线方法性能提高了8%-24%，并且仅使用5%-20%的标记数据即可超越标准全监督联邦学习基线方法（使用100%标记数据）。