Machine learning models exhibit two seemingly contradictory phenomena: training data memorization, and various forms of forgetting. In memorization, models overfit specific training examples and become susceptible to privacy attacks. In forgetting, examples which appeared early in training are forgotten by the end. In this work, we connect these phenomena. We propose a technique to measure to what extent models "forget" the specifics of training examples, becoming less susceptible to privacy attacks on examples they have not seen recently. We show that, while non-convex models can memorize data forever in the worst-case, standard image, speech, and language models empirically do forget examples over time. We identify nondeterminism as a potential explanation, showing that deterministically trained models do not forget. Our results suggest that examples seen early when training with extremely large datasets - for instance those examples used to pre-train a model - may observe privacy benefits at the expense of examples seen later.
翻译:机器学习模型呈现出两种看似矛盾的现象:训练数据记忆化与多种形式的遗忘。在记忆化现象中,模型对特定训练样本过拟合,导致易受隐私攻击;而遗忘现象则表现为训练初期出现的样本在最终阶段被遗忘。本研究将这两种现象联系起来,提出一种测量模型"遗忘"训练样本具体特征的量化方法——当模型不再频繁接触某些样本时,其对该类样本的隐私攻击抵抗力会增强。实验表明,虽然非凸模型在最差情况下可永久记忆数据,但标准图像、语音及语言模型在实证中确实会随时间推移遗忘样本。我们识别出非确定性是潜在解释因素:确定性训练的模型不会产生遗忘。研究结果表明,在使用超大规模数据集进行训练时(例如预训练模型所用的样本),早期接触的样本可能通过牺牲后期样本的隐私安全性来获得隐私保护收益。