Defending Against Data Reconstruction Attacks in Federated Learning: An Information Theory Approach

Federated Learning (FL) trains a black-box and high-dimensional model among different clients by exchanging parameters instead of direct data sharing, which mitigates the privacy leak incurred by machine learning. However, FL still suffers from membership inference attacks (MIA) or data reconstruction attacks (DRA). In particular, an attacker can extract the information from local datasets by constructing DRA, which cannot be effectively throttled by existing techniques, e.g., Differential Privacy (DP). In this paper, we aim to ensure a strong privacy guarantee for FL under DRA. We prove that reconstruction errors under DRA are constrained by the information acquired by an attacker, which means that constraining the transmitted information can effectively throttle DRA. To quantify the information leakage incurred by FL, we establish a channel model, which depends on the upper bound of joint mutual information between the local dataset and multiple transmitted parameters. Moreover, the channel model indicates that the transmitted information can be constrained through data space operation, which can improve training efficiency and the model accuracy under constrained information. According to the channel model, we propose algorithms to constrain the information transmitted in a single round of local training. With a limited number of training rounds, the algorithms ensure that the total amount of transmitted information is limited. Furthermore, our channel model can be applied to various privacy-enhancing techniques (such as DP) to enhance privacy guarantees against DRA. Extensive experiments with real-world datasets validate the effectiveness of our methods.

翻译：联邦学习（FL）通过在不同客户端之间交换参数而非直接共享数据来训练黑盒高维模型，从而减轻机器学习引发的隐私泄露。然而，联邦学习仍然面临成员推断攻击（MIA）或数据重建攻击（DRA）。特别是，攻击者通过构建DRA能够从本地数据集中提取信息，而现有技术（例如差分隐私，DP）无法有效抑制此类攻击。本文旨在确保联邦学习在DRA下的强隐私保障。我们证明，DRA下的重建误差受限于攻击者获取的信息量，这意味着约束传输信息能够有效抑制DRA。为量化联邦学习引发的信息泄露，我们建立了一个信道模型，该模型取决于本地数据集与多个传输参数之间的联合互信息上界。此外，该信道模型表明，通过数据空间操作可以约束传输信息，从而在信息受限条件下提高训练效率和模型精度。基于该信道模型，我们提出了算法来约束单轮本地训练中传输的信息量。在有限训练轮次下，这些算法确保传输信息总量受限。进一步地，我们的信道模型可应用于多种隐私增强技术（如差分隐私）以增强针对DRA的隐私保障。基于真实数据集的广泛实验验证了所提方法的有效性。