In recent years, large language models (LLMs) have spurred a new research paradigm in natural language processing. Despite their excellent capability in knowledge-based question answering and reasoning, their potential to retain faulty or even harmful knowledge poses risks of malicious application. The challenge of mitigating this issue and transforming these models into purer assistants is crucial for their widespread applicability. Unfortunately, Retraining LLMs repeatedly to eliminate undesirable knowledge is impractical due to their immense parameters. Knowledge unlearning, derived from analogous studies on machine unlearning, presents a promising avenue to address this concern and is notably advantageous in the context of LLMs. It allows for the removal of harmful knowledge in an efficient manner, without affecting unrelated knowledge in the model. To this end, we provide a survey of knowledge unlearning in the era of LLMs. Firstly, we formally define the knowledge unlearning problem and distinguish it from related works. Subsequently, we categorize existing knowledge unlearning methods into three classes: those based on parameter optimization, parameter merging, and in-context learning, and introduce details of these unlearning methods. We further present evaluation datasets used in existing methods, and finally conclude this survey by presenting the ongoing challenges and future directions.
翻译:近年来,大型语言模型(LLMs)在自然语言处理领域催生了新的研究范式。尽管它们在基于知识的问答和推理方面表现出卓越能力,但其保留错误甚至有害知识的特性却带来了恶意应用的风险。缓解这一问题并将其转化为更纯粹的辅助工具,对于其广泛应用至关重要。然而,由于模型参数规模巨大,通过反复重新训练来消除不良知识并不现实。知识遗忘——源于机器遗忘领域的同类研究——为解决该问题提供了富有前景的途径,并在LLMs背景下展现出显著优势:它能够高效地移除有害知识,同时不影响模型中无关知识的完整性。为此,我们对LLMs时代的知识遗忘方法进行了系统综述。首先,我们正式定义了知识遗忘问题,并将其与相关研究进行了区分。随后,我们将现有知识遗忘方法分为三类:基于参数优化的方法、基于参数合并的方法以及基于情境学习的方法,并详细介绍了这些遗忘技术的实现细节。此外,我们还梳理了现有方法中使用的评估数据集。最后,通过总结当前挑战与未来方向为本文画上句号。