Cross-silo federated learning (FL) enables decentralized organizations to collaboratively train models while preserving data privacy and has made significant progress in medical image classification. One common assumption is task homogeneity where each client has access to all classes during training. However, in clinical practice, given a multi-label classification task, constrained by the level of medical knowledge and the prevalence of diseases, each institution may diagnose only partial categories, resulting in task heterogeneity. How to pursue effective multi-label medical image classification under task heterogeneity is under-explored. In this paper, we first formulate such a realistic label missing setting in the multi-label FL domain and propose a two-stage method FedMLP to combat class missing from two aspects: pseudo label tagging and global knowledge learning. The former utilizes a warmed-up model to generate class prototypes and select samples with high confidence to supplement missing labels, while the latter uses a global model as a teacher for consistency regularization to prevent forgetting missing class knowledge. Experiments on two publicly-available medical datasets validate the superiority of FedMLP against the state-of-the-art both federated semi-supervised and noisy label learning approaches under task heterogeneity. Code is available at https://github.com/szbonaldo/FedMLP.
翻译:跨机构联邦学习(FL)使分散的组织能够在保护数据隐私的同时协作训练模型,并在医学图像分类领域取得了显著进展。一个常见假设是任务同质性,即每个客户端在训练期间都能访问所有类别。然而,在临床实践中,给定一个多标签分类任务,受限于医疗知识水平和疾病流行程度,每个机构可能仅能诊断部分类别,从而导致任务异构性。如何在任务异构下实现有效的多标签医学图像分类尚未得到充分探索。本文首先在多标签联邦学习领域形式化了这种现实的标签缺失场景,并提出了一种两阶段方法FedMLP,从伪标签标注和全局知识学习两个方面应对类别缺失问题。前者利用预热模型生成类别原型并选择高置信度样本来补充缺失标签,而后者则使用全局模型作为教师进行一致性正则化,以防止遗忘缺失类别的知识。在两个公开可用的医学数据集上的实验验证了FedMLP在任务异构下相较于当前最先进的联邦半监督及噪声标签学习方法的优越性。代码发布于 https://github.com/szbonaldo/FedMLP。