Machine learning (ML) can help fight pandemics like COVID-19 by enabling rapid screening of large volumes of images. To perform data analysis while maintaining patient privacy, we create ML models that satisfy Differential Privacy (DP). Previous works exploring private COVID-19 models are in part based on small datasets, provide weaker or unclear privacy guarantees, and do not investigate practical privacy. We suggest improvements to address these open gaps. We account for inherent class imbalances and evaluate the utility-privacy trade-off more extensively and over stricter privacy budgets. Our evaluation is supported by empirically estimating practical privacy through black-box Membership Inference Attacks (MIAs). The introduced DP should help limit leakage threats posed by MIAs, and our practical analysis is the first to test this hypothesis on the COVID-19 classification task. Our results indicate that needed privacy levels might differ based on the task-dependent practical threat from MIAs. The results further suggest that with increasing DP guarantees, empirical privacy leakage only improves marginally, and DP therefore appears to have a limited impact on practical MIA defense. Our findings identify possibilities for better utility-privacy trade-offs, and we believe that empirical attack-specific privacy estimation can play a vital role in tuning for practical privacy.
翻译:机器学习(ML)可通过快速筛查大量图像为COVID-19等大流行病的防控提供助力。为在数据分析过程中保护患者隐私,我们构建了满足差分隐私(DP)的机器学习模型。现有关于私有COVID-19模型的研究部分基于小规模数据集,提供较弱或模糊的隐私保证,且未探讨实际隐私问题。我们提出改进方案以填补这些空白。具体而言,我们处理了固有的类别不平衡问题,并在更严格的隐私预算下更全面地评估了效用-隐私权衡。通过黑盒成员推理攻击(MIA)对实际隐私进行实证估计,支撑了我们的评估。所引入的差分隐私应有助于限制MIA带来的泄露威胁,而我们的实践分析首次在COVID-19分类任务中验证了这一假设。结果表明,所需的隐私保护级别可能取决于任务相关的MIA实际威胁。研究进一步指出,随着差分隐私保证的增强,经验隐私泄露仅呈边际改善,因此差分隐私对实际MIA防御的影响似乎有限。我们的发现揭示了提升效用-隐私权衡的可能性,并认为基于实证的特定攻击隐私估计在调节实际隐私方面可发挥关键作用。