Large language models (LLMs) demonstrate impressive capabilities across diverse tasks but raise concerns about privacy, copyright, and harmful materials. Existing LLM unlearning methods rarely consider the continual and high-volume nature of real-world deletion requests, which can cause utility degradation and catastrophic forgetting as requests accumulate. To address this challenge, we introduce \fit, a framework for continual unlearning that handles large numbers of deletion requests while maintaining robustness against both catastrophic forgetting and post-unlearning recovery. \fit mitigates degradation through rigorous data \underline{F}iltering, \underline{I}mportance-aware updates, and \underline{T}argeted layer attribution, enabling stable performance across long sequences of unlearning operations and achieving a favorable balance between forgetting effectiveness and utility retention. To support realistic evaluation, we present \textbf{PCH}, a benchmark covering \textbf{P}ersonal information, \textbf{C}opyright, and \textbf{H}armful content in sequential deletion scenarios, along with two symmetric metrics, Forget Degree (F.D.) and Retain Utility (R.U.), which jointly assess forgetting quality and utility preservation. Extensive experiments on four open-source LLMs with hundreds of deletion requests show that \fit achieves the strongest trade-off between F.D. and R.U., surpasses existing methods on MMLU, CommonsenseQA, and GSM8K, and remains resistant against both relearning and quantization recovery attacks.
翻译:大语言模型(LLM)在多样化任务中展现出卓越能力,但也引发了关于隐私、版权和有害材料的担忧。现有的LLM遗忘方法很少考虑现实世界中删除请求的持续性和海量性,随着请求的积累,这可能导致模型效用下降和灾难性遗忘。为应对这一挑战,我们提出\fit,一种持续遗忘框架,能够处理大量删除请求,同时保持对灾难性遗忘和遗忘后恢复的鲁棒性。\fit通过严格的数据\underline{过滤}、\underline{重要性感知更新}和\underline{目标层归因}来缓解性能退化,从而在长序列遗忘操作中实现稳定性能,并在遗忘效果与效用保持之间达到良好平衡。为支持现实评估,我们提出\textbf{PCH}基准,涵盖序列删除场景中的\textbf{个人信息}、\textbf{版权}和\textbf{有害内容},并设计两个对称指标——遗忘度(F.D.)与效用保持度(R.U.),共同评估遗忘质量与效用保留效果。在四个开源LLM上进行的数百次删除请求实验表明,\fit在F.D.与R.U.之间实现了最优权衡,在MMLU、CommonsenseQA和GSM8K基准上超越现有方法,并对再学习和量化恢复攻击均保持抵抗力。