The protection of sensitive data becomes more vital, as data increases in value and potency. Furthermore, the pressure increases from regulators and society on model developers to make their Artificial Intelligence (AI) models non-discriminatory. To boot, there is a need for interpretable, transparent AI models for high-stakes tasks. In general, measuring the fairness of any AI model requires the sensitive attributes of the individuals in the dataset, thus raising privacy concerns. In this work, the trade-offs between fairness, privacy and interpretability are further explored. We specifically examine the Statistical Parity (SP) of Decision Trees (DTs) with Differential Privacy (DP), that are each popular methods in their respective subfield. We propose a novel method, dubbed Privacy-Aware Fairness Estimation of Rules (PAFER), that can estimate SP in a DP-aware manner for DTs. DP, making use of a third-party legal entity that securely holds this sensitive data, guarantees privacy by adding noise to the sensitive data. We experimentally compare several DP mechanisms. We show that using the Laplacian mechanism, the method is able to estimate SP with low error while guaranteeing the privacy of the individuals in the dataset with high certainty. We further show experimentally and theoretically that the method performs better for DTs that humans generally find easier to interpret.
翻译:敏感数据的保护随着数据价值和影响力的增长而愈发重要。此外,监管机构与社会对模型开发者施加的压力日益增大,要求其开发的人工智能(AI)模型不带有歧视性。同时,高风险任务需要可解释、透明的AI模型。通常,衡量任何AI模型的公平性都需要数据集中个体的敏感属性,从而引发隐私担忧。本研究进一步探讨了公平性、隐私性与可解释性之间的权衡。我们专门研究了差分隐私(DP)下决策树(DT)的统计均等(SP),这些方法在其各自子领域中均属主流。我们提出了一种新方法,命名为隐私感知规则公平性估计(PAFER),该方法能以差分隐私的方式估计决策树的统计均等。差分隐私通过利用安全持有敏感数据的第三方法律实体,向敏感数据添加噪声来保障隐私。我们通过实验比较了多种差分隐私机制,结果表明,使用拉普拉斯机制时,该方法能在高度确定性地保障数据集中个体隐私的同时,以低误差估计统计均等。我们进一步通过实验和理论证明,对于人类通常认为更易解释的决策树,该方法的性能表现更优。