Privacy and algorithmic fairness have become two central issues in modern machine learning. Although each has separately emerged as a rapidly growing research area, their joint effect remains comparatively under-explored. In this paper, we systematically study the joint impact of differential privacy and fairness on classification in a federated setting, where data are distributed across multiple servers. Targeting demographic disparity constrained classification under federated differential privacy, we propose a two-step algorithm, namely FDP-Fair. In the special case where there is only one server, we further propose a simple yet powerful algorithm, namely CDP-Fair, serving as a computationally-lightweight alternative. Under mild structural assumptions, theoretical guarantees on privacy, fairness and excess risk control are established. In particular, we disentangle the source of the private fairness-aware excess risk into a) intrinsic cost of classification, b) cost of private classification, c) non-private cost of fairness and d) private cost of fairness. Our theoretical findings are complemented by extensive numerical experiments on both synthetic and real datasets, highlighting the practicality of our designed algorithms.
翻译:隐私保护和算法公平性已成为现代机器学习的两个核心问题。尽管这两个领域各自迅速发展为独立的研究方向,但其联合影响仍相对缺乏探索。本文系统研究了联邦设置(数据分布于多个服务器)下差分隐私与公平性对分类的联合影响。针对联邦差分隐私中受人口统计差异约束的分类问题,我们提出了一种两步算法FDP-Fair。在仅含单个服务器的特例下,我们进一步提出了一种简洁而强大的算法CDP-Fair,作为计算轻量级的替代方案。在温和的结构性假设下,我们建立了关于隐私、公平性和超额风险控制的理论保证。特别地,我们将私有公平感知分类的超额风险来源分解为:a) 分类的内在代价,b) 私有分类的代价,c) 公平性的非私有代价,d) 公平性的私有代价。在合成数据集和真实数据集上的大量数值实验补充了理论发现,凸显了所设计算法的实用性。