Federated learning is gaining increasing popularity, with data heterogeneity and privacy being two prominent challenges. In this paper, we address both issues within a federated transfer learning framework, aiming to enhance learning on a target data set by leveraging information from multiple heterogeneous source data sets while adhering to privacy constraints. We rigorously formulate the notion of \textit{federated differential privacy}, which offers privacy guarantees for each data set without assuming a trusted central server. Under this privacy constraint, we study three classical statistical problems, namely univariate mean estimation, low-dimensional linear regression, and high-dimensional linear regression. By investigating the minimax rates and identifying the costs of privacy for these problems, we show that federated differential privacy is an intermediate privacy model between the well-established local and central models of differential privacy. Our analyses incorporate data heterogeneity and privacy, highlighting the fundamental costs of both in federated learning and underscoring the benefit of knowledge transfer across data sets.
翻译:联邦学习日益普及,数据异构性和隐私保护成为两大突出挑战。本文在联邦迁移学习框架中同时解决这两个问题,旨在通过利用多个异构源数据集的信息来增强目标数据集的学习效果,同时满足隐私约束。我们严谨地定义了“联邦差分隐私”概念,该概念在不假设存在可信中心服务器的情况下为每个数据集提供隐私保障。在此隐私约束下,我们研究了三个经典统计问题:单变量均值估计、低维线性回归和高维线性回归。通过研究极小化极大速率并确定这些问题的隐私成本,我们证明联邦差分隐私是介于成熟的本地差分隐私和中央差分隐私模型之间的中间隐私模型。我们的分析纳入了数据异构性和隐私保护,突显了联邦学习中两者的基本成本,并强调了跨数据集知识迁移的优势。