While in-processing fairness approaches show promise in mitigating biased predictions, their potential impact on privacy leakage remains under-explored. We aim to address this gap by assessing the privacy risks of fairness-enhanced binary classifiers via membership inference attacks (MIAs) and attribute inference attacks (AIAs). Surprisingly, our results reveal that enhancing fairness does not necessarily lead to privacy compromises. For example, these fairness interventions exhibit increased resilience against MIAs and AIAs. This is because fairness interventions tend to remove sensitive information among extracted features and reduce confidence scores for the majority of training data for fairer predictions. However, during the evaluations, we uncover a potential threat mechanism that exploits prediction discrepancies between fair and biased models, leading to advanced attack results for both MIAs and AIAs. This mechanism reveals potent vulnerabilities of fair models and poses significant privacy risks of current fairness methods. Extensive experiments across multiple datasets, attack methods, and representative fairness approaches confirm our findings and demonstrate the efficacy of the uncovered mechanism. Our study exposes the under-explored privacy threats in fairness studies, advocating for thorough evaluations of potential security vulnerabilities before model deployments.
翻译:尽管处理中公平性方法在减轻预测偏见方面展现出潜力,但其对隐私泄露的潜在影响仍未得到充分探索。本研究旨在通过成员推理攻击(MIAs)和属性推理攻击(AIAs)评估公平性增强的二分类器的隐私风险,以填补这一研究空白。令人惊讶的是,我们的结果表明增强公平性并不必然导致隐私妥协。例如,这些公平性干预措施对MIAs和AIAs表现出更强的抵抗力。这是因为公平性干预倾向于去除提取特征中的敏感信息,并为大多数训练数据降低置信度分数以实现更公平的预测。然而,在评估过程中,我们发现了一种潜在的威胁机制,该机制利用公平模型与偏见模型之间的预测差异,导致MIAs和AIAs均能获得更优的攻击结果。这一机制揭示了公平模型的潜在脆弱性,并对当前公平性方法构成了显著的隐私风险。跨多个数据集、攻击方法及代表性公平性方法的广泛实验验证了我们的发现,并证明了所揭示机制的有效性。本研究揭示了公平性研究中尚未被充分探索的隐私威胁,主张在模型部署前对潜在安全漏洞进行全面评估。