Given the prevalence of large language models (LLMs) and the prohibitive cost of training these models from scratch, dynamically forgetting specific knowledge e.g., private or proprietary, without retraining the model has become an important capability. This paper proposes a novel method to achieve this objective called UNLEARN. The approach builds upon subspace methods to identify and specifically target the removal of knowledge without adversely affecting other knowledge in the LLM. Results demonstrate 96% of targeted knowledge can be forgotten while maintaining performance on other knowledge within 2.5% of the original model, significantly outperforming the discriminatory abilities of the previous state-of-the-art. A dual method called LEARN is also proposed for targeted knowledge addition. Results show LEARN can match the fine-tuning accuracy of Low-Rank Adaptation (LoRA) without adversely affecting similar tasks.
翻译:鉴于大型语言模型(LLMs)的广泛应用及其从头训练的巨额成本,如何在不重新训练模型的情况下动态遗忘特定知识(例如私有或专有信息)已成为一项重要能力。本文提出一种名为UNLEARN的新方法来实现这一目标。该方法基于子空间技术,通过识别并精准定位待移除知识,避免对LLM中的其他知识造成负面影响。实验结果表明,该方法能遗忘96%的目标知识,同时在其他知识上的性能表现保持在原始模型的2.5%误差范围内,显著超越了现有最优方法的判别能力。本文还提出了一种名为LEARN的对应方法用于定向知识注入。实验显示LEARN能达到与低秩自适应(LoRA)微调相当的准确率,且不会对相似任务产生负面影响。