Federated learning has emerged as a promising paradigm for collaborative model training while preserving data privacy. However, recent studies have shown that it is vulnerable to various privacy attacks, such as data reconstruction attacks. In this paper, we provide a theoretical analysis of privacy leakage in federated learning from two perspectives: linear algebra and optimization theory. From the linear algebra perspective, we prove that when the Jacobian matrix of the batch data is not full rank, there exist different batches of data that produce the same model update, thereby ensuring a level of privacy. We derive a sufficient condition on the batch size to prevent data reconstruction attacks. From the optimization theory perspective, we establish an upper bound on the privacy leakage in terms of the batch size, the distortion extent, and several other factors. Our analysis provides insights into the relationship between privacy leakage and various aspects of federated learning, offering a theoretical foundation for designing privacy-preserving federated learning algorithms.
翻译:联邦学习作为一种在保护数据隐私的同时实现协同模型训练的新兴范式,已展现出广阔前景。然而,近期研究表明,联邦学习易受多种隐私攻击(例如数据重构攻击)的影响。本文从线性代数与优化理论两个视角,对联邦学习中的隐私泄露问题进行了理论分析。从线性代数视角出发,我们证明了当批量数据的雅可比矩阵不满秩时,存在不同的数据批次能够产生相同的模型更新,从而确保了一定程度的隐私性。我们推导出了防止数据重构攻击所需批量大小的一个充分条件。从优化理论视角出发,我们建立了一个关于隐私泄露上界的理论框架,该上界由批量大小、失真程度及其他若干因素共同决定。我们的分析揭示了隐私泄露与联邦学习多个方面之间的内在联系,为设计隐私保护的联邦学习算法提供了理论基础。