As AI models are trained on ever-expanding datasets, the ability to remove the influence of specific data from trained models has become essential for privacy protection and regulatory compliance. Unlearning addresses this challenge by selectively removing parametric knowledge from the trained models without retraining from scratch, which is critical for resource-intensive models such as Large Language Models (LLMs). Existing unlearning methods often degrade model performance by removing more information than necessary when attempting to ''forget'' specific data. We introduce Forgetting-MarI, an LLM unlearning framework that provably removes only the additional (marginal) information contributed by the data to be unlearned, while preserving the information supported by the data to be retained. By penalizing marginal information, our method yields an explicit upper bound on the unlearn dataset's residual influence in the trained models, providing provable undetectability. Extensive experiments confirm that our approach outperforms current state-of-the-art unlearning methods, delivering reliable forgetting and better preserved general model performance across diverse benchmarks. This advancement represents an important step toward making AI systems more controllable and compliant with privacy and copyright regulations without compromising their effectiveness.
翻译:随着人工智能模型在日益扩大的数据集上进行训练,从已训练模型中移除特定数据影响的能力已成为隐私保护和法规遵从的关键需求。遗忘学习通过选择性地从已训练模型中移除参数化知识来解决这一挑战,而无需从头重新训练,这对于大型语言模型等资源密集型模型尤为重要。现有遗忘方法在尝试“遗忘”特定数据时,常因移除超出必要范围的信息而导致模型性能下降。本文提出Forgetting-MarI,一种大语言模型遗忘学习框架,该方法可证明地仅移除待遗忘数据贡献的额外(边际)信息,同时保留待留存数据所支持的信息。通过惩罚边际信息,我们的方法为已训练模型中待遗忘数据集的残余影响提供了显式上界,从而实现了可证明的不可检测性。大量实验证实,我们的方法优于当前最先进的遗忘学习方法,在多样化基准测试中实现了可靠的遗忘效果和更优的通用模型性能保持。这一进展标志着在不影响AI系统效能的前提下,使其更具可控性并更符合隐私与版权法规要求的重要进步。