This paper studies federated learning for nonparametric regression in the context of distributed samples across different servers, each adhering to distinct differential privacy constraints. The setting we consider is heterogeneous, encompassing both varying sample sizes and differential privacy constraints across servers. Within this framework, both global and pointwise estimation are considered, and optimal rates of convergence over the Besov spaces are established. Distributed privacy-preserving estimators are proposed and their risk properties are investigated. Matching minimax lower bounds, up to a logarithmic factor, are established for both global and pointwise estimation. Together, these findings shed light on the tradeoff between statistical accuracy and privacy preservation. In particular, we characterize the compromise not only in terms of the privacy budget but also concerning the loss incurred by distributing data within the privacy framework as a whole. This insight captures the folklore wisdom that it is easier to retain privacy in larger samples, and explores the differences between pointwise and global estimation under distributed privacy constraints.
翻译:本文研究在不同服务器间分布样本且各服务器遵循不同差分隐私约束的背景下,针对非参数回归问题的联邦学习。我们所考虑的设定具有异质性,涵盖各服务器间不同的样本规模与差分隐私约束。在此框架下,同时考虑了全局估计与逐点估计,并在Besov空间上建立了最优收敛速率。提出了分布式隐私保护估计量并研究了其风险特性。针对全局与逐点估计,建立了与极小极大下界匹配(至对数因子)的收敛速率。这些发现共同揭示了统计准确性与隐私保护之间的权衡关系。特别地,我们不仅从隐私预算的角度,还从隐私框架整体内数据分布所导致的损失角度,刻画了这种权衡。这一见解印证了"大样本中更易保持隐私"的经验共识,并探究了分布式隐私约束下逐点估计与全局估计之间的差异。