Federated learning has emerged as a powerful framework for analysing distributed data, yet two challenges remain pivotal: heterogeneity across sites and privacy of local data. In this paper, we address both challenges within a federated transfer learning framework, aiming to enhance learning on a target data set by leveraging information from multiple heterogeneous source data sets while adhering to privacy constraints. We rigorously formulate the notion of federated differential privacy, which offers privacy guarantees for each data set without assuming a trusted central server. Under this privacy model, we study four statistical problems: univariate mean estimation, low-dimensional linear regression, high-dimensional linear regression, and M-estimation. By investigating the minimax rates and quantifying the cost of privacy, we show that federated differential privacy is an intermediate privacy model between the well-established local and central models of differential privacy. Our analyses account for data heterogeneity and privacy, highlighting the fundamental costs associated with each factor and the benefits of knowledge transfer in federated learning.
翻译:联邦学习已成为分析分布式数据的强大框架,但仍面临两个关键挑战:不同站点间的数据异质性和本地数据的隐私保护。本文在联邦迁移学习框架内同时解决这两个挑战,旨在通过利用多个异质源数据集的信息来增强目标数据集的学习效果,同时遵守隐私约束。我们严格定义了联邦差分隐私的概念,该概念在不假设存在可信中央服务器的情况下为每个数据集提供隐私保证。在该隐私模型下,我们研究了四个统计问题:单变量均值估计、低维线性回归、高维线性回归和M估计。通过研究极小化极大速率并量化隐私代价,我们证明联邦差分隐私是介于已建立的本地差分隐私和中心差分隐私模型之间的中间隐私模型。我们的分析考虑了数据异质性和隐私保护,揭示了每个因素的基本成本以及联邦学习中知识迁移的益处。