Think Twice Before Selection: Federated Evidential Active Learning for Medical Image Analysis with Domain Shifts

Federated learning facilitates the collaborative learning of a global model across multiple distributed medical institutions without centralizing data. Nevertheless, the expensive cost of annotation on local clients remains an obstacle to effectively utilizing local data. To mitigate this issue, federated active learning methods suggest leveraging local and global model predictions to select a relatively small amount of informative local data for annotation. However, existing methods mainly focus on all local data sampled from the same domain, making them unreliable in realistic medical scenarios with domain shifts among different clients. In this paper, we make the first attempt to assess the informativeness of local data derived from diverse domains and propose a novel methodology termed Federated Evidential Active Learning (FEAL) to calibrate the data evaluation under domain shift. Specifically, we introduce a Dirichlet prior distribution in both local and global models to treat the prediction as a distribution over the probability simplex and capture both aleatoric and epistemic uncertainties by using the Dirichlet-based evidential model. Then we employ the epistemic uncertainty to calibrate the aleatoric uncertainty. Afterward, we design a diversity relaxation strategy to reduce data redundancy and maintain data diversity. Extensive experiments and analysis on five real multi-center medical image datasets demonstrate the superiority of FEAL over the state-of-the-art active learning methods in federated scenarios with domain shifts. The code will be available at https://github.com/JiayiChen815/FEAL.

翻译：联邦学习能够在多个分布式医疗机构的协作中训练全局模型，而无需集中数据。然而，本地客户端上昂贵的标注成本仍然是有效利用本地数据的障碍。为缓解这一问题，联邦主动学习方法提出利用本地和全局模型的预测，选择信息量较大的少量本地数据进行标注。然而，现有方法主要关注从同一域采样的本地数据，这使得它们在存在不同客户端之间域漂移的现实医疗场景中不可靠。在本文中，我们首次尝试评估来自不同域的本地数据的信息量，并提出一种名为联邦证据主动学习（FEAL）的新方法来校准域漂移下的数据评估。具体而言，我们在本地和全局模型中引入狄利克雷先验分布，将预测视为概率单纯形上的分布，并通过基于狄利克雷的证据模型同时捕捉偶然不确定性和认知不确定性。然后，我们利用认知不确定性来校准偶然不确定性。随后，我们设计了一种多样性松弛策略，以减少数据冗余并保持数据多样性。对五个真实多中心医学图像数据集的大量实验和分析表明，在存在域漂移的联邦场景中，FEAL优于最先进的主动学习方法。代码将在 https://github.com/JiayiChen815/FEAL 提供。