面向LLMs的鲁棒且参数高效的知识遗忘方法 (Towards Robust and Parameter-Efficient Knowledge Unlearning for LLMs)

Large Language Models (LLMs) have demonstrated strong reasoning and memorization capabilities via pretraining on massive textual corpora. However, this poses risk of privacy and copyright violations, highlighting the need for efficient machine unlearning methods that remove sensitive data without retraining from scratch. While Gradient Ascent (GA) is commonly used to unlearn by reducing the likelihood of generating unwanted content, it leads to unstable optimization and catastrophic forgetting of retrained knowledge. We find that combining GA with low-rank adaptation results in poor trade-offs between computational cost and generative performance. To address these challenges, we propose Low-rank Knowledge Unlearning (LoKU), a novel framework that enables robust and efficient unlearning for LLMs. First, we introduce Inverted Hinge Loss, which suppresses unwanted tokens while maintaining fluency by boosting the probability of the next most likely token. Second, we develop a data-adaptive initialization for LoRA adapters via low-rank approximation weighted with relative Fisher information, thereby focusing updates on parameters critical for removing targeted knowledge. Experiments on the Training Data Extraction Challenge dataset using GPT-Neo models as well as on the TOFU benchmark with Phi-1.5B and Llama2-7B models demonstrate that our approach effectively removes sensitive information while maintaining reasoning and generative capabilities with minimal impact. Our implementation can be found in https://github.com/csm9493/efficient-llm-unlearning.

翻译：大型语言模型（LLMs）通过在大量文本语料上进行预训练，展现出强大的推理与记忆能力。然而，这也带来了隐私与版权侵犯的风险，凸显了需要高效的机器遗忘方法以移除敏感数据而无需从头重新训练。虽然梯度上升（GA）常被用于通过降低生成不良内容的可能性来实现遗忘，但它会导致优化过程不稳定以及已学习知识的灾难性遗忘。我们发现，将GA与低秩适应相结合会在计算成本与生成性能之间产生较差的权衡。为应对这些挑战，我们提出了低秩知识遗忘（LoKU），一种新颖的框架，能够实现LLMs的鲁棒且高效的遗忘。首先，我们引入了反转铰链损失，该损失通过提升次高概率令牌的概率，在抑制不良令牌的同时保持生成流畅性。其次，我们通过基于相对费舍尔信息加权的低秩近似，为LoRA适配器开发了一种数据自适应的初始化方法，从而将更新集中在移除目标知识所需的关键参数上。在GPT-Neo模型上使用训练数据提取挑战数据集进行的实验，以及在Phi-1.5B和Llama2-7B模型上使用TOFU基准进行的实验均表明，我们的方法能有效移除敏感信息，同时以最小的影响保持推理与生成能力。我们的实现可在https://github.com/csm9493/efficient-llm-unlearning找到。