To comply with AI and data regulations, the need to forget private or copyrighted information from trained machine learning models is increasingly important. The key challenge in unlearning is forgetting the necessary data in a timely manner, while preserving model performance. In this work, we address the zero-shot unlearning scenario, whereby an unlearning algorithm must be able to remove data given only a trained model and the data to be forgotten. We explore unlearning from an information theoretic perspective, connecting the influence of a sample to the information gain a model receives by observing it. From this, we derive a simple but principled zero-shot unlearning method based on the geometry of the model. Our approach takes the form of minimising the gradient of a learned function with respect to a small neighbourhood around a target forget point. This induces a smoothing effect, causing forgetting by moving the boundary of the classifier. We explore the intuition behind why this approach can jointly unlearn forget samples while preserving general model performance through a series of low-dimensional experiments. We perform extensive empirical evaluation of our method over a range of contemporary benchmarks, verifying that our method is competitive with state-of-the-art performance under the strict constraints of zero-shot unlearning.
翻译:为遵守人工智能与数据法规,从已训练的机器学习模型中移除隐私或版权信息的必要性日益凸显。遗忘的核心挑战在于:在及时清除必要数据的同时保持模型性能。本文针对零样本遗忘场景展开研究,即遗忘算法必须仅凭已训练模型与待遗忘数据即可完成数据移除。我们从信息论视角探索遗忘机制,将样本影响与模型通过观测该样本所获取的信息增益建立关联。基于此,我们推导出一种基于模型几何结构的简洁且原则性的零样本遗忘方法。该方法通过最小化目标遗忘点附近小邻域内学习函数的梯度来实现,这种平滑效应通过移动分类器边界诱发遗忘。通过一系列低维实验,我们探究了该方法为何能在保持模型整体性能的同时实现遗忘样本的联合移除。我们在当代基准测试上对该方法进行了广泛实证评估,验证了其在零样本遗忘严格约束下与最先进性能具有竞争力的表现。