Private, fair and accurate: Training large-scale, privacy-preserving AI models in medical imaging

Artificial intelligence (AI) models are increasingly used in the medical domain. However, as medical data is highly sensitive, special precautions to ensure its protection are required. The gold standard for privacy preservation is the introduction of differential privacy (DP) to model training. Prior work indicates that DP has negative implications on model accuracy and fairness, which are unacceptable in medicine and represent a main barrier to the widespread use of privacy-preserving techniques. In this work, we evaluated the effect of privacy-preserving training of AI models regarding accuracy and fairness compared to non-private training. For this, we used two datasets: (1) A large dataset (N=193,311) of high quality clinical chest radiographs, and (2) a dataset (N=1,625) of 3D abdominal computed tomography (CT) images, with the task of classifying the presence of pancreatic ductal adenocarcinoma (PDAC). Both were retrospectively collected and manually labeled by experienced radiologists. We then compared non-private deep convolutional neural networks (CNNs) and privacy-preserving (DP) models with respect to privacy-utility trade-offs measured as area under the receiver-operator-characteristic curve (AUROC), and privacy-fairness trade-offs, measured as Pearson's r or Statistical Parity Difference. We found that, while the privacy-preserving trainings yielded lower accuracy, they did largely not amplify discrimination against age, sex or co-morbidity. Our study shows that -- under the challenging realistic circumstances of a real-life clinical dataset -- the privacy-preserving training of diagnostic deep learning models is possible with excellent diagnostic accuracy and fairness.

翻译：人工智能（AI）模型在医学领域的应用日益广泛。然而，由于医疗数据的高度敏感性，需要采取特殊防护措施以保障其安全性。隐私保护的黄金标准是在模型训练中引入差分隐私（DP）。先前研究表明，DP会对模型的准确性和公平性产生负面影响，这在医学领域是不可接受的，并成为隐私保护技术广泛使用的主要障碍。本研究评估了隐私保护训练对AI模型准确性和公平性的影响，并将其与非隐私训练进行对比。为此，我们使用两个数据集：（1）包含193,311例高质量临床胸部X光片的大规模数据集；（2）包含1,625例三维腹部计算机断层扫描（CT）图像的数据集，任务为分类是否存在胰腺导管腺癌（PDAC）。两个数据集均为回顾性收集，并由经验丰富的放射科医生进行人工标注。我们随后比较了非隐私深度卷积神经网络（CNN）与隐私保护（DP）模型，测量了隐私-效用权衡（以接收者操作特征曲线下面积（AUROC）为指标）和隐私-公平性权衡（以皮尔逊相关系数r或统计奇偶差异为指标）。研究发现，尽管隐私保护训练的准确率较低，但总体上并未加剧对年龄、性别或合并症的歧视。我们的研究表明，在真实临床数据集的挑战性现实条件下，对诊断性深度学习模型进行隐私保护训练能够实现卓越的诊断准确性和公平性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

Nat. Biotechnol. | 机器学习为生物库驱动的药物发现提供动力

专知会员服务

11+阅读 · 2022年9月12日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【TPAMI2020】目标检测中的不平衡问题:综述论文，34页pdf

专知会员服务

55+阅读 · 2020年3月16日