Reconstructing samples from the training set of trained neural networks is a major privacy concern. Haim et al. (2022) recently showed that it is possible to reconstruct training samples from neural network binary classifiers, based on theoretical results about the implicit bias of gradient methods. In this work, we present several improvements and new insights over this previous work. As our main improvement, we show that training-data reconstruction is possible in the multi-class setting and that the reconstruction quality is even higher than in the case of binary classification. Moreover, we show that using weight-decay during training increases the vulnerability to sample reconstruction. Finally, while in the previous work the training set was of size at most $1000$ from $10$ classes, we show preliminary evidence of the ability to reconstruct from a model trained on $5000$ samples from $100$ classes.
翻译:从训练好的神经网络训练集中重建样本是一个主要的隐私问题。Haim等人(2022年)基于梯度方法隐式偏差的理论结果,最近证明了从神经网络二分类器中重建训练样本的可能性。在本工作中,我们提出了对该先前工作的若干改进和新见解。作为主要改进,我们证明了在多类设置下训练数据重建是可行的,且重建质量甚至高于二分类情况。此外,我们展示了训练过程中使用权重衰减会增加样本重建的脆弱性。最后,虽然先前工作的训练集最多包含来自10个类别的1000个样本,我们初步证明了从基于100个类别5000个样本训练出的模型中重建的能力。