In recent years, Local Differential Privacy (LDP), a robust privacy-preserving methodology, has gained widespread adoption in real-world applications. With LDP, users can perturb their data on their devices before sending it out for analysis. However, as the collection of multiple sensitive information becomes more prevalent across various industries, collecting a single sensitive attribute under LDP may not be sufficient. Correlated attributes in the data may still lead to inferences about the sensitive attribute. This paper empirically studies the impact of collecting multiple sensitive attributes under LDP on fairness. We propose a novel privacy budget allocation scheme that considers the varying domain size of sensitive attributes. This generally led to a better privacy-utility-fairness trade-off in our experiments than the state-of-art solution. Our results show that LDP leads to slightly improved fairness in learning problems without significantly affecting the performance of the models. We conduct extensive experiments evaluating three benchmark datasets using several group fairness metrics and seven state-of-the-art LDP protocols. Overall, this study challenges the common belief that differential privacy necessarily leads to worsened fairness in machine learning.
翻译:近年来,本地差分隐私(LDP)作为一种稳健的隐私保护方法,在现实应用中得到了广泛采用。通过LDP,用户可以在设备上扰动数据后再将其发送出去进行分析。然而,随着各行业对多种敏感信息的收集日益普遍,在LDP下仅收集单一敏感属性可能不足够。数据中的关联属性仍可能导致对敏感属性的推断。本文通过实验研究了在LDP下收集多种敏感属性对公平性的影响。我们提出了一种新颖的隐私预算分配方案,该方案考虑了敏感属性不同域的大小。在实验中,这通常比现有最优方案实现了更好的隐私-效用-公平性权衡。我们的结果表明,LDP在未显著影响模型性能的同时,略微提升了学习问题中的公平性。我们进行了大量实验,使用多种群体公平性指标和七种最先进的LDP协议,对三个基准数据集进行了评估。总体而言,本研究挑战了差分隐私必然导致机器学习中公平性恶化这一普遍认知。