Machine unlearning, the study of efficiently removing the impact of specific training instances on a model, has garnered increased attention in recent years due to regulatory guidelines such as the \emph{Right to be Forgotten}. Achieving precise unlearning typically involves fully retraining the model and is computationally infeasible in case of very large models such as Large Language Models (LLMs). To this end, recent work has proposed several algorithms which approximate the removal of training data without retraining the model. These algorithms crucially rely on access to the model parameters in order to update them, an assumption that may not hold in practice due to computational constraints or having only query access to the LLMs. In this work, we propose a new class of unlearning methods for LLMs called ``In-Context Unlearning.'' This method unlearns instances from the model by simply providing specific kinds of inputs in context, without the need to update model parameters. To unlearn specific training instances, we present these instances to the LLMs at inference time along with labels that differ from their ground truth. Our experimental results demonstrate that in-context unlearning performs on par with, or in some cases outperforms other state-of-the-art methods that require access to model parameters, effectively removing the influence of specific instances on the model while preserving test accuracy.
翻译:机器遗忘,即研究如何高效消除特定训练实例对模型影响的方法,近年来因《被遗忘权》等监管准则而受到越来越多的关注。实现精确遗忘通常需要完全重新训练模型,对于大型语言模型等超大规模模型而言,这在计算上是不可行的。为此,近期研究提出了多种无需重新训练模型即可近似移除训练数据的算法。这些算法关键依赖于访问模型参数以进行更新,但在实际中可能因计算限制或仅具有对LLMs的查询权限而无法满足此假设。本研究针对LLMs提出了一类名为“上下文遗忘”的新型遗忘方法。该方法仅需在上下文中提供特定类型的输入即可实现实例遗忘,无需更新模型参数。为遗忘特定训练实例,我们在推理阶段将这些实例与偏离其真实标签的标注一同呈现给LLMs。实验结果表明,上下文遗忘方法的性能与需要访问模型参数的现有先进方法相当,在某些情况下甚至更优,能够有效消除特定实例对模型的影响,同时保持测试准确率。