Privacy Preserving Federated Learning in Medical Imaging with Uncertainty Estimation

Machine learning (ML) and Artificial Intelligence (AI) have fueled remarkable advancements, particularly in healthcare. Within medical imaging, ML models hold the promise of improving disease diagnoses, treatment planning, and post-treatment monitoring. Various computer vision tasks like image classification, object detection, and image segmentation are poised to become routine in clinical analysis. However, privacy concerns surrounding patient data hinder the assembly of large training datasets needed for developing and training accurate, robust, and generalizable models. Federated Learning (FL) emerges as a compelling solution, enabling organizations to collaborate on ML model training by sharing model training information (gradients) rather than data (e.g., medical images). FL's distributed learning framework facilitates inter-institutional collaboration while preserving patient privacy. However, FL, while robust in privacy preservation, faces several challenges. Sensitive information can still be gleaned from shared gradients that are passed on between organizations during model training. Additionally, in medical imaging, quantifying model confidence\uncertainty accurately is crucial due to the noise and artifacts present in the data. Uncertainty estimation in FL encounters unique hurdles due to data heterogeneity across organizations. This paper offers a comprehensive review of FL, privacy preservation, and uncertainty estimation, with a focus on medical imaging. Alongside a survey of current research, we identify gaps in the field and suggest future directions for FL research to enhance privacy and address noisy medical imaging data challenges.

翻译：机器学习（ML）与人工智能（AI）推动了显著的技术进步，尤其在医疗健康领域。在医学影像中，ML模型有望改善疾病诊断、治疗规划及治疗后监测。图像分类、目标检测和图像分割等多种计算机视觉任务正逐步成为临床分析的常规手段。然而，围绕患者数据的隐私问题阻碍了构建大规模训练数据集，而这些数据集对于开发并训练准确、鲁棒且可泛化的模型至关重要。联邦学习（FL）作为一种极具前景的解决方案应运而生，它使各机构能够通过共享模型训练信息（梯度）而非数据（如医学影像）来协作进行ML模型训练。FL的分布式学习框架促进了跨机构合作，同时保护了患者隐私。然而，FL虽然在隐私保护方面表现稳健，仍面临若干挑战。在模型训练期间，机构间传递的共享梯度仍可能泄露敏感信息。此外，在医学影像中，由于数据中存在噪声和伪影，准确量化模型置信度/不确定性至关重要。由于各机构间的数据异质性，FL中的不确定性估计遇到了独特的障碍。本文对FL、隐私保护及不确定性估计进行了全面综述，并聚焦于医学影像领域。在梳理当前研究的同时，我们指出了该领域的空白，并为未来的FL研究提出了增强隐私保护及应对噪声医学影像数据挑战的方向。