``The right to be forgotten'' ensured by laws for user data privacy becomes increasingly important. Machine unlearning aims to efficiently remove the effect of certain data points on the trained model parameters so that it can be approximately the same as if one retrains the model from scratch. This work proposes stochastic gradient Langevin unlearning, the first unlearning framework based on noisy stochastic gradient descent (SGD) with privacy guarantees for approximate unlearning problems under convexity assumption. Our results show that mini-batch gradient updates provide a superior privacy-complexity trade-off compared to the full-batch counterpart. There are numerous algorithmic benefits of our unlearning approach, including complexity saving compared to retraining, and supporting sequential and batch unlearning. To examine the privacy-utility-complexity trade-off of our method, we conduct experiments on benchmark datasets compared against prior works. Our approach achieves a similar utility under the same privacy constraint while using $2\%$ and $10\%$ of the gradient computations compared with the state-of-the-art gradient-based approximate unlearning methods for mini-batch and full-batch settings, respectively.
翻译:用户数据隐私法律保障的“被遗忘权”日益重要。机器遗忘旨在高效消除特定数据点对训练模型参数的影响,使其近似于从头重新训练模型的效果。本文提出随机梯度朗之万遗忘——首个基于带噪随机梯度下降(SGD)的遗忘框架,在凸性假设下为近似遗忘问题提供隐私保障。研究结果表明,小批量梯度更新相比全批量方案具有更优的隐私-复杂度权衡。该遗忘方法具备多项算法优势,包括相比重训练节省计算复杂度、支持序列遗忘和批量遗忘。为检验本方法的隐私-效用-复杂度权衡,我们在基准数据集上进行实验并与既有研究对比。在相同隐私约束下,本方法达到相近效用水平,但小批量与全批量设置下梯度计算量分别仅为当前最先进梯度近似遗忘方法的2%和10%。