Fairness and privacy are two vital pillars of trustworthy machine learning. Despite extensive research on these individual topics, their relationship has received significantly less attention. In this paper, we utilize an information-theoretic measure Chernoff Information to characterize the fundamental trade-off between fairness, privacy, and accuracy, as induced by the input data distribution. We first propose Chernoff Difference, a notion of data fairness, along with its noisy variant, Noisy Chernoff Difference, which allows us to analyze both fairness and privacy simultaneously. Through simple Gaussian examples, we show that Noisy Chernoff Difference exhibits three qualitatively distinct behaviors depending on the underlying data distribution. To extend this analysis beyond synthetic settings, we develop the Chernoff Information Neural Estimator (CINE), the first neural network-based estimator of Chernoff Information for unknown distributions. We apply CINE to analyze the Noisy Chernoff Difference on real-world datasets. Together, this work fills a critical gap in the literature by providing a principled, data-dependent characterization of the fairness-privacy interaction.
翻译:公平性与隐私性是可信机器学习的两大核心支柱。尽管关于这两个主题已有大量独立研究,但二者之间的关联却鲜受关注。本文利用信息论度量——切尔诺夫信息,来刻画由输入数据分布所诱导的公平性、隐私性与准确性之间的基本权衡。我们首先提出数据公平性概念——切尔诺夫差异,及其带噪变体——带噪切尔诺夫差异,这使得我们能同时分析公平性与隐私性。通过简单高斯示例,我们证明带噪切尔诺夫差异会根据底层数据分布呈现出三种性质不同的行为。为将这一分析拓展至合成设置之外,我们开发了切尔诺夫信息神经估计器(CINE),这是首个基于神经网络的未知分布切尔诺夫信息估计器。我们应用CINE分析真实世界数据集上的带噪切尔诺夫差异。综上,本研究通过提供一种基于数据驱动原则的公平性-隐私性相互作用刻画,填补了文献中的关键空白。