The most challenging, yet practical, setting of semi-supervised federated learning (SSFL) is where a few clients have fully labeled data whereas the other clients have fully unlabeled data. This is particularly common in healthcare settings where collaborating partners (typically hospitals) may have images but not annotations. The bottleneck in this setting is the joint training of labeled and unlabeled clients as the objective function for each client varies based on the availability of labels. This paper investigates an alternative way for effective training with labeled and unlabeled clients in a federated setting. We propose a novel learning scheme specifically designed for SSFL which we call Isolated Federated Learning (IsoFed) that circumvents the problem by avoiding simple averaging of supervised and semi-supervised models together. In particular, our training approach consists of two parts - (a) isolated aggregation of labeled and unlabeled client models, and (b) local self-supervised pretraining of isolated global models in all clients. We evaluate our model performance on medical image datasets of four different modalities publicly available within the biomedical image classification benchmark MedMNIST. We further vary the proportion of labeled clients and the degree of heterogeneity to demonstrate the effectiveness of the proposed method under varied experimental settings.
翻译:半监督联邦学习(SSFL)最具挑战性但最实际的场景是部分客户端拥有完全标注数据,而其他客户端则拥有完全未标注数据。这在医疗场景中尤为常见——合作方(通常是医院)可能拥有影像数据但缺乏标注信息。该场景的瓶颈在于标注客户端与未标注客户端的联合训练,因为各客户端的目标函数会因标注可用性而异。本文探讨了联邦场景下利用标注和未标注客户端进行有效训练的替代方案。我们提出了一种专为SSFL设计的新型学习方案——隔离联邦学习(IsoFed),通过避免简单聚合监督模型与半监督模型来解决上述问题。具体而言,我们的训练方法包含两个部分:(a) 对标注与非标注客户端模型进行隔离聚合,(b) 在所有客户端本地对隔离后的全局模型进行自监督预训练。我们在生物医学图像分类基准MedMNIST公开的四类模态医学影像数据集上评估了模型性能,并通过改变标注客户端的比例与非同质程度,验证了该方法在不同实验设置下的有效性。