Large language model unlearning has garnered increasing attention due to its potential to address security and privacy concerns, leading to extensive research in the field. However, much of this research has concentrated on instance-level unlearning, specifically targeting the removal of predefined instances containing sensitive content. This focus has left a significant gap in the exploration of full entity-level unlearning, which is critical in real-world scenarios such as copyright protection. To this end, we propose a novel task of Entity-level unlearning, which aims to erase entity-related knowledge from the target model completely. To thoroughly investigate this task, we systematically evaluate trending unlearning algorithms, revealing that current methods struggle to achieve effective entity-level unlearning. Then, we further explore the factors that influence the performance of the unlearning algorithms, identifying that knowledge coverage and the size of the forget set play pivotal roles. Notably, our analysis also uncovers that entities introduced through fine-tuning are more vulnerable to unlearning than pre-trained entities. These findings collectively offer valuable insights for advancing entity-level unlearning for LLMs.
翻译:大语言模型遗忘因其在解决安全与隐私问题方面的潜力而日益受到关注,推动了该领域的广泛研究。然而,现有研究多集中于实例级遗忘,即专注于移除包含敏感内容的预定义实例。这一侧重点导致了对完整实体级遗忘探索的显著空白,而实体级遗忘在版权保护等现实场景中至关重要。为此,我们提出了一种新颖的实体级遗忘任务,其目标是从目标模型中彻底擦除与实体相关的知识。为深入探究此任务,我们系统评估了当前流行的遗忘算法,发现现有方法难以实现有效的实体级遗忘。随后,我们进一步探索了影响遗忘算法性能的因素,指出知识覆盖范围与遗忘集的大小起着关键作用。值得注意的是,我们的分析还揭示,通过微调引入的实体比预训练实体更容易被遗忘。这些发现共同为推进大语言模型的实体级遗忘研究提供了宝贵的见解。