As data-driven and AI-based decision making gains widespread adoption in most disciplines, it is crucial that both data privacy and decision fairness are appropriately addressed. While differential privacy (DP) provides a robust framework for guaranteeing privacy and several widely accepted methods have been proposed for improving fairness, the vast majority of existing literature treats the two concerns independently. For methods that do consider privacy and fairness simultaneously, they often only apply to a specific machine learning task, limiting their generalizability. In response, we introduce SAFES, a Sequential PrivAcy and Fairness Enhancing data Synthesis procedure that sequentially combines DP data synthesis with a fairness-aware data transformation. SAFES allows full control over the privacy-fairness-utility trade-off via tunable privacy and fairness parameters. We illustrate SAFES by combining AIM, a graphical model-based DP data synthesizer, with a popular fairness-aware data pre-processing transformation. Empirical evaluations on the Adult and COMPAS datasets demonstrate that for reasonable privacy loss, SAFES-generated synthetic data achieve significantly improved fairness metrics with relatively low utility loss.
翻译:随着数据驱动和基于人工智能的决策在大多数学科中得到广泛应用,确保数据隐私和决策公平性得到妥善处理变得至关重要。虽然差分隐私(DP)为保障隐私提供了一个稳健的框架,并且已有多种被广泛接受的方法被提出以提升公平性,但绝大多数现有文献将这两个问题独立对待。对于那些同时考虑隐私和公平性的方法,它们通常仅适用于特定的机器学习任务,限制了其普适性。为此,我们提出了SAFES,一种顺序隐私与公平增强数据合成流程,该流程将DP数据合成与公平感知的数据转换顺序结合。SAFES通过可调的隐私和公平参数,允许对隐私-公平-效用权衡进行完全控制。我们通过将AIM(一种基于图模型的DP数据合成器)与一种流行的公平感知数据预处理转换相结合,来具体说明SAFES。在Adult和COMPAS数据集上的实证评估表明,在合理的隐私损失下,SAFES生成的合成数据能以相对较低的效用损失,显著改善公平性指标。