Unlearning algorithms aim to remove deleted data's influence from trained models at a cost lower than full retraining. However, prior guarantees of unlearning in literature are flawed and don't protect the privacy of deleted records. We show that when users delete their data as a function of published models, records in a database become interdependent. So, even retraining a fresh model after deletion of a record doesn't ensure its privacy. Secondly, unlearning algorithms that cache partial computations to speed up the processing can leak deleted information over a series of releases, violating the privacy of deleted records in the long run. To address these, we propose a sound deletion guarantee and show that the privacy of existing records is necessary for the privacy of deleted records. Under this notion, we propose an accurate, computationally efficient, and secure machine unlearning algorithm based on noisy gradient descent.
翻译:遗忘算法旨在以低于完全重新训练的成本,从已训练模型中移除已删除数据的影响。然而,文献中已有的遗忘保证存在缺陷,无法保护已删除记录的隐私。我们证明,当用户基于已发布模型删除其数据时,数据库中的记录会变得相互依赖。因此,即使在删除记录后重新训练新模型,也无法确保其隐私。其次,那些缓存部分计算以加速处理的遗忘算法,在多次发布过程中可能泄露已删除信息,从而长期违反已删除记录的隐私。为解决这些问题,我们提出了一种可靠的删除保证,并证明现有记录的隐私是保障已删除记录隐私所必需的。在此概念下,我们提出了一种基于噪声梯度下降的准确、计算高效且安全的机器遗忘算法。