Understanding when and how much a model gradient leaks information about the training sample is an important question in privacy. In this paper, we present a surprising result: even without training or memorizing the data, we can fully reconstruct the training samples from a single gradient query at a randomly chosen parameter value. We prove the identifiability of the training data under mild conditions: with shallow or deep neural networks and a wide range of activation functions. We also present a statistically and computationally efficient algorithm based on tensor decomposition to reconstruct the training data. As a provable attack that reveals sensitive training data, our findings suggest potential severe threats to privacy, especially in federated learning.
翻译:理解模型梯度在何种程度上泄露训练样本信息是隐私领域的重要问题。本文报告了一个令人意外的结论:即使不经过训练或记忆数据,我们仅需在随机选择的参数值上执行一次梯度查询,即可完全重建训练样本。我们证明了在温和条件下训练数据的可辨识性——这些条件涵盖浅层与深层神经网络以及多种激活函数。我们还提出了一种基于张量分解的统计与计算高效算法,用于重建训练数据。作为一种可证明的、揭露敏感训练数据的攻击手段,我们的发现表明隐私可能面临严重威胁,尤其是在联邦学习场景中。