Recent works have shown that selecting an optimal model architecture suited to the differential privacy setting is necessary to achieve the best possible utility for a given privacy budget using differentially private stochastic gradient descent (DP-SGD)(Tramer and Boneh 2020; Cheng et al. 2022). In light of these findings, we empirically analyse how different fairness notions, belonging to distinct classes of statistical fairness criteria (independence, separation and sufficiency), are impacted when one selects a model architecture suitable for DP-SGD, optimized for utility. Using standard datasets from ML fairness literature, we show using a rigorous experimental protocol, that by selecting the optimal model architecture for DP-SGD, the differences across groups concerning the relevant fairness metrics (demographic parity, equalized odds and predictive parity) more often decrease or are negligibly impacted, compared to the non-private baseline, for which optimal model architecture has also been selected to maximize utility. These findings challenge the understanding that differential privacy will necessarily exacerbate unfairness in deep learning models trained on biased datasets.
翻译:近期研究表明,在差分隐私随机梯度下降(DP-SGD)框架下,为特定隐私预算选择适配于差分隐私设置的最优模型架构,是实现在给定隐私预算下最佳效用的必要手段(Tramer和Boneh,2020;Cheng等,2022)。基于这些发现,我们通过经验分析,探讨了当选择适用于DP-SGD且经效用优化的模型架构时,不同公平性概念(分属独立性、分离性和充分性三类统计公平性标准)所受的影响。利用机器学习公平性文献中的标准数据集,并采用严格的实验方案,我们发现:相比非隐私基线(其中也已选择最优模型架构以最大化效用),通过选择DP-SGD的最优模型架构,相关公平性指标(人口统计均等、等机会和预测均等)的组间差异更常呈现下降趋势,或仅受到可忽略的影响。这些发现对“差分隐私必然加剧基于有偏数据集训练的深度学习模型的不公平性”这一认知提出了挑战。