Understanding when and how much a model gradient leaks information about the training sample is an important question in privacy. In this paper, we present a surprising result: even without training or memorizing the data, we can fully reconstruct the training samples from a single gradient query at a randomly chosen parameter value. We prove the identifiability of the training data under mild conditions: with shallow or deep neural networks and a wide range of activation functions. We also present a statistically and computationally efficient algorithm based on tensor decomposition to reconstruct the training data. As a provable attack that reveals sensitive training data, our findings suggest potential severe threats to privacy, especially in federated learning.
翻译:理解模型梯度在何时以及多大程度上泄露训练样本信息,是隐私领域的一个重要问题。本文提出一个令人惊讶的结果:即使不训练或记忆数据,我们也能在随机选择的参数值上,通过单一梯度查询完全重构训练样本。我们在温和条件下证明了训练数据的可辨识性:这些条件包括浅层或深层神经网络以及广泛的激活函数。我们还提出了一种基于张量分解的统计和计算高效算法,用于重构训练数据。作为一种可证明的、揭示敏感训练数据的攻击方法,我们的发现表明隐私可能面临严重威胁,特别是在联邦学习中。