Machine unlearning, the study of efficiently removing the impact of specific training instances on a model, has garnered increased attention in recent years due to regulatory guidelines such as the \emph{Right to be Forgotten}. Achieving precise unlearning typically involves fully retraining the model and is computationally infeasible in case of very large models such as Large Language Models (LLMs). To this end, recent work has proposed several algorithms which approximate the removal of training data without retraining the model. These algorithms crucially rely on access to the model parameters in order to update them, an assumption that may not hold in practice due to computational constraints or having only query access to the LLMs. In this work, we propose a new class of unlearning methods for LLMs called ``In-Context Unlearning.'' This method unlearns instances from the model by simply providing specific kinds of inputs in context, without the need to update model parameters. To unlearn specific training instances, we present these instances to the LLMs at inference time along with labels that differ from their ground truth. Our experimental results demonstrate that in-context unlearning performs on par with, or in some cases outperforms other state-of-the-art methods that require access to model parameters, effectively removing the influence of specific instances on the model while preserving test accuracy.
翻译:机器遗忘,即研究如何高效移除特定训练实例对模型的影响,近年来因《被遗忘权》等监管准则而受到越来越多的关注。实现精确遗忘通常需要完全重新训练模型,对于大型语言模型(LLMs)等超大规模模型而言,这在计算上是不可行的。为此,近期研究提出了多种无需重新训练模型即可近似移除训练数据的算法。这些算法关键依赖于访问模型参数以进行更新,这一假设在实践中可能因计算限制或仅拥有对LLMs的查询访问权限而无法成立。在本工作中,我们提出了一类针对LLMs的新型遗忘方法,称为“上下文遗忘”。该方法通过在上下文中提供特定类型的输入来实现对实例的遗忘,无需更新模型参数。为了遗忘特定训练实例,我们在推理阶段将这些实例与不同于其真实标签的标签一同呈现给LLMs。实验结果表明,上下文遗忘的性能与需要访问模型参数的其他最先进方法相当,在某些情况下甚至更优,能有效移除特定实例对模型的影响,同时保持测试准确率。