In recent years, large language models (LLMs) have spurred a new research paradigm in natural language processing. Despite their excellent capability in knowledge-based question answering and reasoning, their potential to retain faulty or even harmful knowledge poses risks of malicious application. The challenge of mitigating this issue and transforming these models into purer assistants is crucial for their widespread applicability. Unfortunately, Retraining LLMs repeatedly to eliminate undesirable knowledge is impractical due to their immense parameters. Knowledge unlearning, derived from analogous studies on machine unlearning, presents a promising avenue to address this concern and is notably advantageous in the context of LLMs. It allows for the removal of harmful knowledge in an efficient manner, without affecting unrelated knowledge in the model. To this end, we provide a survey of knowledge unlearning in the era of LLMs. Firstly, we formally define the knowledge unlearning problem and distinguish it from related works. Subsequently, we categorize existing knowledge unlearning methods into three classes: those based on parameter optimization, parameter merging, and in-context learning, and introduce details of these unlearning methods. We further present evaluation datasets used in existing methods, and finally conclude this survey by presenting the ongoing challenges and future directions.
翻译:近年来,大语言模型(LLMs)在自然语言处理领域催生了新的研究范式。尽管这些模型在基于知识的问答与推理方面表现出色,但其可能保留错误甚至有害知识的内在特性,带来了恶意应用的潜在风险。如何缓解这一问题并将这些模型转化为更纯净的助手,对其广泛应用至关重要。然而,由于模型参数规模庞大,通过反复重新训练来消除不良知识并不现实。源自机器遗忘研究的“知识遗忘”技术为解决这一难题提供了有前景的途径,并在大语言模型背景下展现出显著优势——它能够高效移除有害知识,同时不影响模型中无关知识的完整性。为此,我们撰写了这篇关于大语言模型时代知识遗忘问题的综述。首先,我们正式定义了知识遗忘问题,并将其与相关研究加以区分;随后,将现有知识遗忘方法划分为三类:基于参数优化的方法、基于参数合并的方法和基于上下文学习的方法,并详细阐述了这些遗忘技术的实现细节;进一步,我们介绍了现有方法中使用的评估数据集;最后,通过总结当前面临的挑战与未来发展方向,为本文综述画上句号。