Modern deep learning requires large volumes of data, which could contain sensitive or private information which cannot be leaked. Recent work has shown for homogeneous neural networks a large portion of this training data could be reconstructed with only access to the trained network parameters. While the attack was shown to work empirically, there exists little formal understanding of its effectiveness regime, and ways to defend against it. In this work, we first build a stronger version of the dataset reconstruction attack and show how it can provably recover its entire training set in the infinite width regime. We then empirically study the characteristics of this attack on two-layer networks and reveal that its success heavily depends on deviations from the frozen infinite-width Neural Tangent Kernel limit. More importantly, we formally show for the first time that dataset reconstruction attacks are a variation of dataset distillation. This key theoretical result on the unification of dataset reconstruction and distillation not only sheds more light on the characteristics of the attack but enables us to design defense mechanisms against them via distillation algorithms.
翻译:现代深度学习需要大量数据,这些数据可能包含敏感或隐私信息,因此不得泄露。最近的研究表明,对于同质神经网络,仅通过访问训练好的网络参数即可重建大部分训练数据。尽管该攻击在实证中有效,但对其有效性的理论理解以及防御方法仍十分有限。在本工作中,我们首先构建了一个更强的数据集重建攻击版本,并证明其在无限宽度条件下能够可靠地恢复整个训练集。随后,我们通过实证研究了两层网络下该攻击的特性,揭示其成功与否高度依赖于与冻结的无限宽度神经正切核极限的偏离程度。更重要的是,我们首次从理论上证明数据集重建攻击是数据集蒸馏的一种变体。这一关于数据集重建与蒸馏统一性的关键理论结果,不仅阐明了攻击的特性,还使我们能够通过蒸馏算法设计针对该攻击的防御机制。