To comply with AI and data regulations, the need to forget private or copyrighted information from trained machine learning models is increasingly important. The key challenge in unlearning is forgetting the necessary data in a timely manner, while preserving model performance. In this work, we address the zero-shot unlearning scenario, whereby an unlearning algorithm must be able to remove data given only a trained model and the data to be forgotten. We explore unlearning from an information theoretic perspective, connecting the influence of a sample to the information gain a model receives by observing it. From this, we derive a simple but principled zero-shot unlearning method based on the geometry of the model. Our approach takes the form of minimising the gradient of a learned function with respect to a small neighbourhood around a target forget point. This induces a smoothing effect, causing forgetting by moving the boundary of the classifier. We explore the intuition behind why this approach can jointly unlearn forget samples while preserving general model performance through a series of low-dimensional experiments. We perform extensive empirical evaluation of our method over a range of contemporary benchmarks, verifying that our method is competitive with state-of-the-art performance under the strict constraints of zero-shot unlearning. Code for the project can be found at https://github.com/jwf40/Information-Theoretic-Unlearning
翻译:为遵守人工智能与数据监管要求,从训练完成的机器学习模型中遗忘私有或受版权保护信息的需求日益迫切。遗忘任务的核心挑战在于及时消除指定数据的同时保持模型性能。本研究针对零样本遗忘场景展开探索,该场景要求遗忘算法仅能基于已训练模型及待遗忘数据实现数据移除。我们从信息论视角剖析遗忘问题,将样本影响力与模型通过观察该样本所获得的信息增益相关联。基于此,我们提出一种简洁而严谨的零样本遗忘方法,其原理建立在模型几何结构之上。该方法通过最小化学习函数在目标遗忘点邻域内的梯度实现遗忘,这种平滑效应会推动分类边界移动从而达成遗忘目标。我们通过系列低维实验阐释了该方法在遗忘指定样本的同时保持模型整体性能的内在机理。我们在多个当代基准测试中对该方法进行了广泛实证评估,验证了在零样本遗忘的严格约束下,本方法达到了与前沿技术相竞争的性能水平。项目代码详见 https://github.com/jwf40/Information-Theoretic-Unlearning