Machine unlearning, the study of efficiently removing the impact of specific training points on the trained model, has garnered increased attention of late, driven by the need to comply with privacy regulations like the \emph{Right to be Forgotten}. Although unlearning is particularly relevant for LLMs in light of the copyright issues they raise, achieving precise unlearning is computationally infeasible for very large models. To this end, recent work has proposed several algorithms which approximate the removal of training data without retraining the model. These algorithms crucially rely on access to the model parameters in order to update them, an assumption that may not hold in practice due to computational constraints or when the LLM is accessed via API. In this work, we propose a new class of unlearning methods for LLMs we call ``In-Context Unlearning'', providing inputs in context and without having to update model parameters. To unlearn a particular training instance, we provide the instance alongside a flipped label and additional correctly labelled instances which are prepended as inputs to the LLM at inference time. Our experimental results demonstrate that these contexts effectively remove specific information from the training set while maintaining performance levels that are competitive with (or in some cases exceed) state-of-the-art unlearning methods that require access to the LLM parameters.
翻译:机器消除学习(即高效移除特定训练点对已训练模型影响的研究)因需遵守如《被遗忘权》等隐私法规而日益受到关注。尽管消除学习对解决大型语言模型(LLM)引发的版权问题尤为重要,但在超大规模模型上实现精确消除在计算上不可行。为此,近期研究提出了多种无需重新训练即可近似移除训练数据的算法。这些算法的关键前提是对模型参数进行更新,然而实际中因计算限制或通过API访问LLM时,这一假设可能无法成立。本文提出一类新型LLM消除学习方法——"上下文内消除学习",通过提供上下文输入而无需更新模型参数。为消除特定训练实例,我们在推理时将包含翻转标签的该实例与额外正确标签实例拼接作为输入。实验结果表明,这些上下文能有效移除训练集中的特定信息,同时保持与需访问LLM参数的最新消除方法相当(甚至更优)的性能水平。