Share Your Representation Only: Guaranteed Improvement of the Privacy-Utility Tradeoff in Federated Learning

Repeated parameter sharing in federated learning causes significant information leakage about private data, thus defeating its main purpose: data privacy. Mitigating the risk of this information leakage, using state of the art differentially private algorithms, also does not come for free. Randomized mechanisms can prevent convergence of models on learning even the useful representation functions, especially if there is more disagreement between local models on the classification functions (due to data heterogeneity). In this paper, we consider a representation federated learning objective that encourages various parties to collaboratively refine the consensus part of the model, with differential privacy guarantees, while separately allowing sufficient freedom for local personalization (without releasing it). We prove that in the linear representation setting, while the objective is non-convex, our proposed new algorithm \DPFEDREP\ converges to a ball centered around the \emph{global optimal} solution at a linear rate, and the radius of the ball is proportional to the reciprocal of the privacy budget. With this novel utility analysis, we improve the SOTA utility-privacy trade-off for this problem by a factor of $\sqrt{d}$, where $d$ is the input dimension. We empirically evaluate our method with the image classification task on CIFAR10, CIFAR100, and EMNIST, and observe a significant performance improvement over the prior work under the same small privacy budget. The code can be found in this link: https://github.com/shenzebang/CENTAUR-Privacy-Federated-Representation-Learning.

翻译：联邦学习中反复的参数共享会导致大量关于私有数据的信息泄露，从而违背其主要目的：数据隐私。即使使用最先进的差分隐私算法来缓解这种信息泄露的风险，也并非没有代价。随机化机制可能会阻碍模型学习有用的表示函数，尤其是当局部模型在分类函数上存在较大分歧时（由于数据异质性）。在本文中，我们考虑一个表示联邦学习目标，该目标鼓励各方在差分隐私保证下协作优化模型的共识部分，同时分别允许足够的本地个性化自由（而不泄露该部分）。我们证明，在线性表示设置下，尽管目标函数是非凸的，我们提出的新算法\DPFEDREP\以线性速率收敛到以\emph{全局最优}解为中心的球内，且球的半径与隐私预算的倒数成正比。通过这一新颖的效用分析，我们将该问题的隐私-效用权衡在最先进方法基础上改进了$\sqrt{d}$倍，其中$d$是输入维度。我们通过在CIFAR10、CIFAR100和EMNIST数据集上的图像分类任务进行实证评估，发现在相同低隐私预算下，我们的方法相比先前工作有显著性能提升。代码可在此链接获取：https://github.com/shenzebang/CENTAUR-Privacy-Federated-Representation-Learning。