Modern deep learning requires large volumes of data, which could contain sensitive or private information that cannot be leaked. Recent work has shown for homogeneous neural networks a large portion of this training data could be reconstructed with only access to the trained network parameters. While the attack was shown to work empirically, there exists little formal understanding of its effective regime which datapoints are susceptible to reconstruction. In this work, we first build a stronger version of the dataset reconstruction attack and show how it can provably recover the \emph{entire training set} in the infinite width regime. We then empirically study the characteristics of this attack on two-layer networks and reveal that its success heavily depends on deviations from the frozen infinite-width Neural Tangent Kernel limit. Next, we study the nature of easily-reconstructed images. We show that both theoretically and empirically, reconstructed images tend to "outliers" in the dataset, and that these reconstruction attacks can be used for \textit{dataset distillation}, that is, we can retrain on reconstructed images and obtain high predictive accuracy.
翻译:现代深度学习需要大量数据,其中可能包含敏感或隐私信息,这些信息不得泄露。近期研究表明,对于同质神经网络,仅通过访问训练后的网络参数即可重构大部分训练数据。尽管该攻击在经验上被证明有效,但其有效范围以及哪些数据点容易遭受重构仍缺乏正式的理解。在本工作中,我们首先构建了一个更强的数据集重构攻击版本,并展示其如何在无限宽度设定下可证明地恢复全部训练集。随后,我们通过实验研究该攻击在两层网络上的特性,揭示其成功与否高度依赖于与冻结的无限宽度神经正切核极限的偏差。接下来,我们研究易于重构的图像的本质。我们证明,理论上和实验上,重构图像往往是数据集中的“离群点”,并且这些重构攻击可用于数据集蒸馏,即我们可以在重构图像上重新训练并获得较高的预测精度。