Federated Learning (FL) emerged as a learning method to enable the server to train models over data distributed among various clients. These clients are protective about their data being leaked to the server, any other client, or an external adversary, and hence, locally train the model and share it with the server rather than sharing the data. The introduction of sophisticated inferencing attacks enabled the leakage of information about data through access to model parameters. To tackle this challenge, privacy-preserving federated learning aims to achieve differential privacy through learning algorithms like DP-SGD. However, such methods involve adding noise to the model, data, or gradients, reducing the model's performance. This work provides a theoretical analysis of the tradeoff between model performance and communication complexity of the FL system. We formally prove that training for one local epoch per global round of training gives optimal performance while preserving the same privacy budget. We also investigate the change of utility (tied to privacy) of FL models with a change in the number of clients and observe that when clients are training using DP-SGD and argue that for the same privacy budget, the utility improved with increased clients. We validate our findings through experiments on real-world datasets. The results from this paper aim to improve the performance of privacy-preserving federated learning systems.
翻译:联邦学习(FL)作为一种学习范式,使得服务器能够在分布于多个客户端的数据上训练模型。这些客户端对其数据可能泄露给服务器、其他客户端或外部攻击者具有保护意识,因此仅在本地训练模型并将其共享给服务器,而非直接共享数据。随着复杂推理攻击的出现,通过访问模型参数可能导致数据信息泄露。为应对这一挑战,隐私保护的联邦学习旨在通过如DP-SGD等学习算法实现差分隐私。然而,此类方法需对模型、数据或梯度添加噪声,从而降低模型性能。本文对联邦学习系统中模型性能与通信复杂度之间的权衡关系进行了理论分析。我们严格证明了在每轮全局训练中仅执行一次本地周期训练,可在保持相同隐私预算的前提下获得最优性能。同时,我们研究了联邦学习模型的效用(与隐私性相关)随客户端数量变化的情况,并指出当客户端使用DP-SGD进行训练时,在相同隐私预算下,效用随客户端数量增加而提升。我们通过在真实数据集上的实验验证了上述结论。本文的研究成果旨在提升隐私保护联邦学习系统的性能。