As data-driven and AI-based decision making gains widespread adoption across disciplines, it is crucial that both data privacy and decision fairness are appropriately addressed. Although differential privacy (DP) provides a robust framework for guaranteeing privacy and methods are available to improve fairness, most prior work treats the two concerns separately. Even though there are existing approaches that consider privacy and fairness simultaneously, they typically focus on a single specific learning task, limiting their generalizability. In response, we introduce SAFES, a Sequential PrivAcy and Fairness Enhancing data Synthesis procedure that sequentially combines DP data synthesis with a fairness-aware data preprocessing step. SAFES allows users flexibility in navigating the privacy-fairness-utility trade-offs. We illustrate SAFES with different DP synthesizers and fairness-aware data preprocessing methods and run extensive experiments on multiple real datasets to examine the privacy-fairness-utility trade-offs of synthetic data generated by SAFES. Empirical evaluations demonstrate that for reasonable privacy loss, SAFES-generated synthetic data can achieve significantly improved fairness metrics with relatively low utility loss.
翻译:随着数据驱动和基于人工智能的决策在各学科领域得到广泛应用,确保数据隐私与决策公平性得到妥善处理至关重要。尽管差分隐私(DP)为隐私保护提供了稳健的理论框架,且存在多种提升公平性的方法,但现有研究大多将这两个问题割裂处理。虽然已有同时考虑隐私与公平性的方法,但它们通常仅针对单一特定学习任务,限制了其泛化能力。为此,我们提出SAFES——一种序列化隐私与公平性增强数据合成流程,通过将差分隐私数据合成与公平感知数据预处理步骤进行序列化组合。SAFES允许用户在隐私-公平性-效用权衡中灵活调整策略。我们结合不同差分隐私合成器与公平感知数据预处理方法展示了SAFES的灵活性,并在多个真实数据集上进行了大量实验,系统评估了SAFES生成合成数据在隐私-公平性-效用间的权衡关系。实证评估表明,在合理隐私损失范围内,SAFES生成的合成数据能以相对较低的效用损失实现显著提升的公平性指标。