In the effort to learn from extensive collections of distributed data, federated learning has emerged as a promising approach for preserving privacy by using a gradient-sharing mechanism instead of exchanging raw data. However, recent studies show that private training data can be leaked through many gradient attacks. While previous analytical-based attacks have successfully reconstructed input data from fully connected layers, their effectiveness diminishes when applied to convolutional layers. This paper introduces an advanced data leakage method to efficiently exploit convolutional layers' gradients. We present a surprising finding: even with non-fully invertible activation functions, such as ReLU, we can analytically reconstruct training samples from the gradients. To the best of our knowledge, this is the first analytical approach that successfully reconstructs convolutional layer inputs directly from the gradients, bypassing the need to reconstruct layers' outputs. Prior research has mainly concentrated on the weight constraints of convolution layers, overlooking the significance of gradient constraints. Our findings demonstrate that existing analytical methods used to estimate the risk of gradient attacks lack accuracy. In some layers, attacks can be launched with less than 5% of the reported constraints.
翻译:在从大量分布式数据中学习的过程中,联邦学习作为一种有前景的方法,通过使用梯度共享机制而非交换原始数据来保护隐私。然而,近期研究表明,私有训练数据可能通过多种梯度攻击发生泄漏。尽管以往基于分析的方法已成功从全连接层重构输入数据,但这些方法在应用于卷积层时效果显著下降。本文提出一种先进的数据泄漏方法,以高效利用卷积层的梯度。我们揭示了一个令人惊讶的发现:即使对于非完全可逆的激活函数(如ReLU),我们也能从梯度中解析地重构训练样本。据我们所知,这是第一种直接从梯度成功重构卷积层输入的分析方法,无需重构层输出。先前研究主要集中于卷积层的权重约束,而忽视了梯度约束的重要性。我们的研究结果表明,现有用于评估梯度攻击风险的分析方法缺乏准确性。在某些层中,攻击可以在少于所报告约束5%的条件下发起。