Protecting patient data privacy is a critical concern when deploying machine learning algorithms in healthcare. Differential privacy (DP) is a common method for preserving privacy in such settings and, in this work, we examine two key trade-offs in applying DP to the NLP task of medical coding (ICD classification). Regarding the privacy-utility trade-off, we observe a significant performance drop in the privacy preserving models, with more than a 40% reduction in micro F1 scores on the top 50 labels in the MIMIC-III dataset. From the perspective of the privacy-fairness trade-off, we also observe an increase of over 3% in the recall gap between male and female patients in the DP models. Further understanding these trade-offs will help towards the challenges of real-world deployment.
翻译:在医疗领域部署机器学习算法时,保护患者数据隐私至关重要。差分隐私(DP)是此类场景中常用的隐私保护方法,本研究探讨了将DP应用于医疗编码(ICD分类)这一自然语言处理任务时面临的两个关键权衡问题。针对隐私-效用权衡,我们观察到隐私保护模型性能显著下降,在MIMIC-III数据集前50个标签上的微观F1分数降幅超过40%。从隐私-公平性权衡视角分析,DP模型中男性与女性患者间的召回率差距也增加了3%以上。深入理解这些权衡关系将有助于应对实际部署中的挑战。