Automated decision systems are increasingly used to make consequential decisions in people's lives. Due to the sensitivity of the manipulated data as well as the resulting decisions, several ethical concerns need to be addressed for the appropriate use of such technologies, in particular, fairness and privacy. Unlike previous work, which focused on centralized differential privacy (DP) or local DP (LDP) for a single sensitive attribute, in this paper, we examine the impact of LDP in the presence of several sensitive attributes (i.e., multi-dimensional data) on fairness. Detailed empirical analysis on synthetic and benchmark datasets revealed very relevant observations. In particular, (1) multi-dimensional LDP is an efficient approach to reduce disparity, (2) the multi-dimensional approach of LDP (independent vs. combined) matters only at low privacy guarantees, and (3) the outcome Y distribution has an important effect on which group is more sensitive to the obfuscation. Last, we summarize our findings in the form of recommendations to guide practitioners in adopting effective privacy-preserving practices while maintaining fairness and utility in ML applications.
翻译:自动化决策系统越来越多地被用于影响人们生活的关键决策中。由于所处理数据及由此产生决策的敏感性,此类技术的合理应用需要解决多项伦理关切,尤其是公平性与隐私保护。与以往侧重于单个敏感属性的集中式差分隐私(DP)或局部差分隐私(LDP)研究不同,本文探讨了在存在多个敏感属性(即多维数据)情况下LDP对公平性的影响。基于合成数据集和基准数据集的详细实证分析揭示了若干重要发现。具体而言:(1)多维LDP是降低差异性的有效方法;(2)多维LDP的实现方式(独立扰动与联合扰动)仅在低隐私保护水平下才会产生显著差异;(3)结果变量Y的分布对哪个群体更易受混淆机制影响具有重要影响。最后,我们以建议形式总结研究发现,旨在指导从业者在保持机器学习应用公平性与实用性的同时,采用有效的隐私保护实践。