Artificial intelligence (AI) models are increasingly used in the medical domain. However, as medical data is highly sensitive, special precautions to ensure the protection of said data are required. The gold standard for privacy preservation is the introduction of differential privacy (DP) to model training. However, prior work has shown that DP has negative implications on model accuracy and fairness. Therefore, the purpose of this study is to demonstrate that the privacy-preserving training of AI models for chest radiograph diagnosis is possible with high accuracy and fairness compared to non-private training. N=193,311 high quality clinical chest radiographs were retrospectively collected and manually labeled by experienced radiologists, who assigned one or more of the following diagnoses: cardiomegaly, congestion, pleural effusion, pneumonic infiltration and atelectasis, to each side (where applicable). The non-private AI models were compared with privacy-preserving (DP) models with respect to privacy-utility trade-offs (measured as area under the receiver-operator-characteristic curve (AUROC)), and privacy-fairness trade-offs (measured as Pearson-R or Statistical Parity Difference). The non-private AI model achieved an average AUROC score of 0.90 over all labels, whereas the DP AI model with a privacy budget of epsilon=7.89 resulted in an AUROC of 0.87, i.e., a mere 2.6% performance decrease compared to non-private training. The privacy-preserving training of diagnostic AI models can achieve high performance with a small penalty on model accuracy and does not amplify discrimination against age, sex or co-morbidity. We thus encourage practitioners to integrate state-of-the-art privacy-preserving techniques into medical AI model development.
翻译:人工智能(AI)模型在医学领域的应用日益广泛。然而,由于医疗数据高度敏感,必须采取特殊措施确保数据保护。隐私保护的金标准是在模型训练中引入差分隐私(DP)。然而,既往研究表明DP会对模型准确性和公平性产生负面影响。因此,本研究旨在证明,在胸部X光片诊断的AI模型隐私保护训练中,可以实现与非隐私训练相当的高准确性和公平性。研究回顾性收集了N=193,311张高质量临床胸部X光片,并由经验丰富的放射科医生进行人工标注。医生根据每侧(适用时)诊断结果分配以下一种或多种诊断:心脏肥大、肺淤血、胸腔积液、肺炎性浸润和肺不张。将非隐私AI模型与隐私保护(DP)模型在隐私-效用权衡(以受试者工作特征曲线下面积(AUROC)衡量)和隐私-公平性权衡(以Pearson-R或统计奇偶性差异衡量)方面进行比较。非隐私AI模型在所有标签上的平均AUROC得分为0.90,而隐私预算为epsilon=7.89的DP AI模型AUROC为0.87,即与非隐私训练相比仅降低2.6%的性能。诊断AI模型的隐私保护训练可在对模型准确性造成较小损失的情况下实现高性能,且不会加剧对年龄、性别或合并症的歧视。我们因此鼓励从业者将最先进的隐私保护技术整合到医疗AI模型开发中。