Federated learning (FL) has emerged as a promising paradigm for training segmentation models on decentralized medical data, owing to its privacy-preserving property. However, existing research overlooks the prevalent annotation noise encountered in real-world medical datasets, which limits the performance ceilings of FL. In this paper, we, for the first time, identify and tackle this problem. For problem formulation, we propose a contour evolution for modeling non-independent and identically distributed (Non-IID) noise across pixels within each client and then extend it to the case of multi-source data to form a heterogeneous noise model (i.e., Non-IID annotation noise across clients). For robust learning from annotations with such two-level Non-IID noise, we emphasize the importance of data quality in model aggregation, allowing high-quality clients to have a greater impact on FL. To achieve this, we propose Federated learning with Annotation quAlity-aware AggregatIon, named FedA3I, by introducing a quality factor based on client-wise noise estimation. Specifically, noise estimation at each client is accomplished through the Gaussian mixture model and then incorporated into model aggregation in a layer-wise manner to up-weight high-quality clients. Extensive experiments on two real-world medical image segmentation datasets demonstrate the superior performance of FedA$^3$I against the state-of-the-art approaches in dealing with cross-client annotation noise. The code is available at https://github.com/wnn2000/FedAAAI.
翻译:联邦学习(FL)因其隐私保护特性,已成为在分布式医疗数据上训练分割模型的一种有前景的范式。然而,现有研究忽视了真实医学数据集中普遍存在的标注噪声问题,这限制了FL的性能上限。本文首次识别并解决了该问题。在问题建模方面,我们提出一种轮廓演化方法,对每个客户端内像素间的非独立同分布(Non-IID)噪声进行建模,并进一步将其扩展到多源数据场景,形成异构噪声模型(即跨客户端的Non-IID标注噪声)。为从具有这种两级Non-IID噪声的标注中实现鲁棒学习,我们强调数据质量在模型聚合中的重要性,使高质量客户端对FL产生更大影响。为此,我们提出一种基于标注质量感知聚合的联邦学习方法(称为FedA3I),通过引入基于客户端噪声估计的质量因子来实现。具体而言,每个客户端的噪声估计通过高斯混合模型完成,随后以逐层方式融入模型聚合中,以提升高质量客户端的权重。在两个真实医学图像分割数据集上的大量实验表明,FedA3I在处理跨客户端标注噪声方面优于现有最优方法。代码可在https://github.com/wnn2000/FedAAAI 获取。