Federated learning (FL) allows the collaborative training of AI models without needing to share raw data. This capability makes it especially interesting for healthcare applications where patient and data privacy is of utmost concern. However, recent works on the inversion of deep neural networks from model gradients raised concerns about the security of FL in preventing the leakage of training data. In this work, we show that these attacks presented in the literature are impractical in FL use-cases where the clients' training involves updating the Batch Normalization (BN) statistics and provide a new baseline attack that works for such scenarios. Furthermore, we present new ways to measure and visualize potential data leakage in FL. Our work is a step towards establishing reproducible methods of measuring data leakage in FL and could help determine the optimal tradeoffs between privacy-preserving techniques, such as differential privacy, and model accuracy based on quantifiable metrics. Code is available at https://nvidia.github.io/NVFlare/research/quantifying-data-leakage.
翻译:联邦学习(FL)允许在不共享原始数据的情况下协同训练人工智能模型。这一特性使其在医疗等高度关注患者和数据隐私的应用场景中极具吸引力。然而,近期有关从模型梯度中反转深度神经网络的研究引发了人们对联邦学习在防止训练数据泄露方面安全性的担忧。本研究表明,现有文献中提出的这些攻击在客户端训练涉及更新批归一化(BN)统计量的联邦学习场景中并不实用,并针对此类场景提出了新的基准攻击方法。此外,我们提出了衡量和可视化联邦学习中潜在数据泄露的新方法。本文工作旨在建立可复现的联邦学习数据泄露量化方法,并有助于基于可量化指标确定差分隐私等隐私保护技术与模型精度之间的最优权衡。相关代码已开源至 https://nvidia.github.io/NVFlare/research/quantifying-data-leakage。