Large Language Models (LLMs) have demonstrated strong reasoning and memorization capabilities via pretraining on massive textual corpora. However, training LLMs on human-written text entails significant risk of privacy and copyright violations, which demands an efficient machine unlearning framework to remove knowledge of sensitive data without retraining the model from scratch. While Gradient Ascent (GA) is widely used for unlearning by reducing the likelihood of generating unwanted information, the unboundedness of increasing the cross-entropy loss causes not only unstable optimization, but also catastrophic forgetting of knowledge that needs to be retained. We also discover its joint application under low-rank adaptation results in significantly suboptimal computational cost vs. generative performance trade-offs. In light of this limitation, we propose two novel techniques for robust and cost-efficient unlearning on LLMs. We first design an Inverted Hinge loss that suppresses unwanted tokens by increasing the probability of the next most likely token, thereby retaining fluency and structure in language generation. We also propose to initialize low-rank adapter weights based on Fisher-weighted low-rank approximation, which induces faster unlearning and better knowledge retention by allowing model updates to be focused on parameters that are important in generating textual data we wish to remove.
翻译:大型语言模型(LLMs)通过在大量文本语料上进行预训练,展现出强大的推理和记忆能力。然而,在人类撰写的文本上训练LLMs存在显著的隐私和版权侵权风险,这需要一种高效的机器遗忘框架,以移除敏感数据的知识而无需从头重新训练模型。虽然梯度上升(GA)被广泛用于通过降低生成不必要信息的可能性来实现遗忘,但增加交叉熵损失的无限性不仅导致优化不稳定,还会造成需要保留知识的灾难性遗忘。我们还发现其在低秩适配下的联合应用会导致计算成本与生成性能之间的权衡显著次优。鉴于这一局限性,我们提出了两种用于LLMs的鲁棒且经济高效遗忘的新技术。我们首先设计了一种反向铰链损失,通过增加下一个最可能标记的概率来抑制不需要的标记,从而在语言生成中保持流畅性和结构。我们还提出基于费希尔加权低秩近似来初始化低秩适配器权重,这通过使模型更新集中于对生成我们希望移除的文本数据重要的参数,从而诱导更快的遗忘和更好的知识保留。