Machine learning models are often trained on sensitive data (e.g., medical records and race/gender) that is distributed across different "silos" (e.g., hospitals). These federated learning models may then be used to make consequential decisions, such as allocating healthcare resources. Two key challenges emerge in this setting: (i) maintaining the privacy of each person's data, even if other silos or an adversary with access to the central server tries to infer this data; (ii) ensuring that decisions are fair to different demographic groups (e.g., race/gender). In this paper, we develop a novel algorithm for private and fair federated learning (FL). Our algorithm satisfies inter-silo record-level differential privacy (ISRL-DP), a strong notion of private FL requiring that silo i's sent messages satisfy record-level differential privacy for all i. Our framework can be used to promote different fairness notions, including demographic parity and equalized odds. We prove that our algorithm converges under mild smoothness assumptions on the loss function, whereas prior work required strong convexity for convergence. As a byproduct of our analysis, we obtain the first convergence guarantee for ISRL-DP nonconvex-strongly concave min-max FL. Experiments demonstrate the state-of-the-art fairness-accuracy tradeoffs of our algorithm across different privacy levels.
翻译:机器学习模型通常基于分布在多个"数据孤岛"(如医院)中的敏感数据(如医疗记录和种族/性别信息)进行训练。这些联邦学习模型随后可能被用于关键决策,例如医疗资源分配。在此场景下存在两大挑战:(i) 保护每个个体数据的隐私性,即使其他数据孤岛或能够访问中央服务器的攻击者试图推断这些数据;(ii) 确保决策对不同人口统计群体(如种族/性别)保持公平性。本文提出了一种新颖的隐私与公平联邦学习算法。该算法满足跨孤岛记录级差分隐私——这是一种严格的隐私联邦学习要求,要求所有孤岛i发送的消息均满足记录级差分隐私。本框架可用于实现不同的公平性概念,包括人口统计均等与机会均等。我们证明了在损失函数满足温和平滑性假设的条件下算法能够收敛,而现有研究需要强凸性才能保证收敛。作为分析的副产品,我们首次获得了ISRL-DP非凸-强凹极小极大联邦学习问题的收敛性保证。实验结果表明,在不同隐私级别下,本算法在公平性与准确性的权衡方面达到了当前最优水平。