Differential privacy (DP) provides formal protection for sensitive data but typically incurs substantial losses in diagnostic performance. Model initialization has emerged as a critical factor in mitigating this degradation, yet the role of modern self-supervised learning under full-model DP remains poorly understood. Here, we present a large-scale evaluation of initialization strategies for differentially private medical image analysis, using chest radiograph classification as a representative benchmark with more than 800,000 images. Using state-of-the-art ConvNeXt models trained with DP-SGD across realistic privacy regimes, we compare non-domain-specific supervised ImageNet initialization, non-domain-specific self-supervised DINOv3 initialization, and domain-specific supervised pretraining on MIMIC-CXR, the largest publicly available chest radiograph dataset. Evaluations are conducted across five external datasets spanning diverse institutions and acquisition settings. We show that DINOv3 initialization consistently improves diagnostic utility relative to ImageNet initialization under DP, but remains inferior to domain-specific supervised pretraining, which achieves performance closest to non-private baselines. We further demonstrate that initialization choice strongly influences demographic fairness, cross-dataset generalization, and robustness to data scale and model capacity under privacy constraints. The results establish initialization strategy as a central determinant of utility, fairness, and generalization in differentially private medical imaging.
翻译:差分隐私(DP)为敏感数据提供了形式化保护,但通常会导致诊断性能的显著下降。模型初始化已成为缓解这种性能退化的关键因素,然而现代自监督学习在全模型差分隐私下的作用仍不甚明确。本文通过超过80万张图像的胸片分类作为代表性基准,对差分隐私医学影像分析的初始化策略进行了大规模评估。我们采用经差分隐私随机梯度下降(DP-SGD)训练的先进ConvNeXt模型,在现实隐私保护范围内比较了三种初始化策略:非领域特定的监督式ImageNet初始化、非领域特定的自监督DINOv3初始化,以及在最大公开胸片数据集MIMIC-CXR上进行的领域特定监督式预训练。评估在涵盖不同机构和采集设置的五个外部数据集上进行。研究表明,在差分隐私条件下,DINOv3初始化相较于ImageNet初始化能持续提升诊断效用,但仍逊色于领域特定监督式预训练——后者实现了最接近非隐私基线的性能。我们进一步证明,初始化选择会显著影响人口统计学公平性、跨数据集泛化能力,以及在隐私约束下对数据规模和模型容量的鲁棒性。这些结果确立了初始化策略作为差分隐私医学影像分析中效用、公平性和泛化能力的核心决定因素。