Federated learning is gaining increasing popularity, with data heterogeneity and privacy being two prominent challenges. In this paper, we address both issues within a federated transfer learning framework, aiming to enhance learning on a target data set by leveraging information from multiple heterogeneous source data sets while adhering to privacy constraints. We rigorously formulate the notion of \textit{federated differential privacy}, which offers privacy guarantees for each data set without assuming a trusted central server. Under this privacy constraint, we study three classical statistical problems, namely univariate mean estimation, low-dimensional linear regression, and high-dimensional linear regression. By investigating the minimax rates and identifying the costs of privacy for these problems, we show that federated differential privacy is an intermediate privacy model between the well-established local and central models of differential privacy. Our analyses incorporate data heterogeneity and privacy, highlighting the fundamental costs of both in federated learning and underscoring the benefit of knowledge transfer across data sets.
翻译:联邦学习日益受到关注,其中数据异质性和隐私保护是两大核心挑战。本文在联邦迁移学习框架下同时解决这两个问题,旨在在遵守隐私约束的前提下,通过利用多个异质源数据集的信息来增强目标数据集的学习效果。我们严谨地定义了"联邦差分隐私"概念,该概念无需依赖可信中心服务器即可为每个数据集提供隐私保障。在此隐私约束下,我们研究了三个经典统计问题,即单变量均值估计、低维线性回归和高维线性回归。通过研究极小化极大最优率并识别这些问题的隐私代价,我们证明联邦差分隐私是介于成熟的本地差分隐私和中心差分隐私之间的中间隐私模型。我们的分析融合了数据异质性和隐私保护,揭示了联邦学习中两者的基本代价,并强调了跨数据集知识迁移的益处。