With the implementation of personal data privacy regulations, the field of machine learning (ML) faces the challenge of the "right to be forgotten". Machine unlearning has emerged to address this issue, aiming to delete data and reduce its impact on models according to user requests. Despite the widespread interest in machine unlearning, comprehensive surveys on its latest advancements, especially in the field of Large Language Models (LLMs) is lacking. This survey aims to fill this gap by providing an in-depth exploration of machine unlearning, including the definition, classification and evaluation criteria, as well as challenges in different environments and their solutions. Specifically, this paper categorizes and investigates unlearning on both traditional models and LLMs, and proposes methods for evaluating the effectiveness and efficiency of unlearning, and standards for performance measurement. This paper reveals the limitations of current unlearning techniques and emphasizes the importance of a comprehensive unlearning evaluation to avoid arbitrary forgetting. This survey not only summarizes the key concepts of unlearning technology but also points out its prominent issues and feasible directions for future research, providing valuable guidance for scholars in the field.
翻译:随着个人数据隐私法规的实施,机器学习领域面临“被遗忘权”的挑战。机器遗忘技术应运而生,旨在根据用户请求删除数据并降低其对模型的影响。尽管机器遗忘受到广泛关注,但关于其最新进展(尤其是大语言模型领域)的全面综述尚显不足。本综述旨在填补这一空白,深入探讨机器遗忘的定义、分类、评估标准,以及不同环境下的挑战及其解决方案。具体而言,本文对传统模型与大语言模型中的遗忘技术进行了分类研究,提出了评估遗忘效果与效率的方法,并建立了性能衡量标准。本文揭示了当前遗忘技术的局限性,并强调了全面评估遗忘效果以避免随意遗忘的重要性。本综述不仅总结了遗忘技术的核心概念,还指出了其突出问题与未来研究的可行方向,为该领域学者提供了有价值的参考。