Fairness and privacy are two vital pillars of trustworthy machine learning. Despite extensive research on these individual topics, the relationship between fairness and privacy has received significantly less attention. In this paper, we utilize the information-theoretic measure Chernoff Information to highlight the data-dependent nature of the relationship among the triad of fairness, privacy, and accuracy. We first define Noisy Chernoff Difference, a tool that allows us to analyze the relationship among the triad simultaneously. We then show that for synthetic data, this value behaves in 3 distinct ways (depending on the distribution of the data). We highlight the data distributions involved in these cases and explore their fairness and privacy implications. Additionally, we show that Noisy Chernoff Difference acts as a proxy for the steepness of the fairness-accuracy curves. Finally, we propose a method for estimating Chernoff Information on data from unknown distributions and utilize this framework to examine the triad dynamic on real datasets. This work builds towards a unified understanding of the fairness-privacy-accuracy relationship and highlights its data-dependent nature.
翻译:公平性与隐私性是可信机器学习的两大核心支柱。尽管针对这两个独立主题已有广泛研究,但公平性与隐私性之间的关系却鲜有关注。本文利用信息论度量切尔诺夫信息,揭示了公平性、隐私性与准确性三者关系的数据依赖本质。我们首先定义了噪声切尔诺夫差异这一工具,使其能够同时分析三者之间的关系。随后我们证明,对于合成数据,该数值会呈现三种不同的行为模式(取决于数据分布特性)。我们重点分析了这些情形所涉及的数据分布,并探讨其对公平性与隐私性的影响。此外,我们还证明了噪声切尔诺夫差异可作为公平性-准确性曲线陡峭程度的代理指标。最后,我们提出了一种针对未知分布数据的切尔诺夫信息估计方法,并利用该框架在真实数据集上检验了三者间的动态关系。本研究致力于构建对公平性-隐私性-准确性关系的统一理解,并凸显其数据依赖的本质特征。