Large language models (LLMs) have achieved remarkable success across natural language processing tasks, yet their widespread deployment raises pressing concerns around privacy, copyright, security, and bias. Machine unlearning has emerged as a promising paradigm for selectively removing knowledge or data from trained models without full retraining. In this survey, we provide a structured overview of unlearning methods for LLMs, categorizing existing approaches into data-centric, parameter-centric, architecture-centric, hybrid, and other strategies. We also review the evaluation ecosystem, including benchmarks, metrics, and datasets designed to measure forgetting effectiveness, knowledge retention, and robustness. Finally, we outline key challenges and open problems, such as scalable efficiency, formal guarantees, cross-language and multimodal unlearning, and robustness against adversarial relearning. By synthesizing current progress and highlighting open directions, this paper aims to serve as a roadmap for developing reliable and responsible unlearning techniques in large language models.
翻译:大语言模型在自然语言处理任务中取得了显著成功,但其广泛部署引发了关于隐私、版权、安全与偏见的迫切关注。机器遗忘学习作为一种新兴范式,能够在无需完全重新训练的情况下,从已训练模型中选择性移除特定知识或数据。本综述系统梳理了面向大语言模型的遗忘学习方法,将现有技术归纳为数据中心、参数中心、架构中心、混合策略及其他方法等类别。同时,我们回顾了包括基准测试、评估指标与数据集在内的评估体系,这些工具旨在衡量遗忘效能、知识保留度与模型鲁棒性。最后,我们阐述了关键挑战与开放性问题,例如可扩展效率、形式化保证、跨语言与多模态遗忘学习,以及对抗性再学习防御等。通过整合当前进展并指明开放研究方向,本文旨在为开发可靠且负责任的遗忘学习技术提供发展路线图。