Machine learning models are deployed as a central component in decision making and policy operations with direct impact on individuals' lives. In order to act ethically and comply with government regulations, these models need to make fair decisions and protect the users' privacy. However, such requirements can come with decrease in models' performance compared to their potentially biased, privacy-leaking counterparts. Thus the trade-off between fairness, privacy and performance of ML models emerges, and practitioners need a way of quantifying this trade-off to enable deployment decisions. In this work we interpret this trade-off as a multi-objective optimization problem, and propose PFairDP, a pipeline that uses Bayesian optimization for discovery of Pareto-optimal points between fairness, privacy and utility of ML models. We show how PFairDP can be used to replicate known results that were achieved through manual constraint setting process. We further demonstrate effectiveness of PFairDP with experiments on multiple models and datasets.
翻译:机器学习模型作为决策制定和政策执行的核心组件被部署,直接影响着个人的生活。为了符合伦理要求并遵守政府法规,这些模型需要做出公平的决策并保护用户隐私。然而,与可能存在偏见和隐私泄露风险的模型相比,这些要求可能导致模型性能下降。因此,机器学习模型的公平性、隐私性与性能之间出现了权衡问题,实践者需要一种量化该权衡的方法以支持部署决策。本文将这一权衡解释为多目标优化问题,并提出PFairDP管道,该管道使用贝叶斯优化来发现机器学习模型在公平性、隐私性与效用之间的帕累托最优点。我们展示了如何通过PFairDP复现通过手动约束设置过程实现的已知结果,并通过多个模型和数据集上的实验进一步证明了PFairDP的有效性。