面向缓解LLM遗忘中过度遗忘的纠缠引导与代理约束方法 (Towards Mitigating Excessive Forgetting in LLM Unlearning via Entanglement-Guidance with Proxy Constraint)

Large language models (LLMs) are trained on massive datasets that may include private or copyrighted content. Due to growing privacy and ownership concerns, data owners may request the removal of their data from trained models. Machine unlearning provides a practical solution by removing the influence of specific data without full retraining. However, most existing methods still suffer from over-unlearning due to the lack of a principled mechanism to regulate the forgetting boundary, leading to unnecessary utility degradation and heightened privacy and robustness risks. In this work, we propose EGUP (Entanglement-Guided Unlearning with Proxy Constraint), a novel framework that leverages entanglement and proxy constraint to guide the unlearning process while mitigating over-unlearning. Within each iteration, EGUP employs inter-sample entanglement to adaptively reweight the unlearning strength, assigning greater unlearning efforts to forget samples that are semantically closer to retained knowledge. Across iterations, EGUP leverages intra-sample entanglement to track the representation shift of each forget sample and dynamically adjust its unlearning effort. In addition, we incorporate a proxy constraint that approximates the model's expected outputs after unlearning, forming a reference boundary that softly regularizes the unlearning process. EGUP is compatible with existing gradient-based objectives and serves as a plug-and-play enhancement. We evaluate EGUP on the TOFU and MUSE benchmarks, demonstrating consistent improvements in the unlearning-utility trade-off across multiple LLMs. Moreover, EGUP achieves performance close to the retrained model while remaining scalable and robust.

翻译：大型语言模型（LLM）在可能包含私有或受版权保护内容的海量数据集上进行训练。随着隐私和所有权问题的日益凸显，数据所有者可能要求从其训练模型中移除自身数据。机器遗忘提供了一种无需完整重新训练即可消除特定数据影响的实用解决方案。然而，由于缺乏调节遗忘边界的原理性机制，现有方法大多仍存在过度遗忘问题，导致不必要的性能退化以及更高的隐私与鲁棒性风险。本文提出EGUP（基于代理约束的纠缠引导遗忘框架），这是一种利用纠缠关系和代理约束来引导遗忘过程、同时缓解过度遗忘的新型框架。在每次迭代中，EGUP通过样本间纠缠自适应地重新加权遗忘强度，为语义上更接近保留知识的待遗忘样本分配更强的遗忘力度。在跨迭代过程中，EGUP借助样本内纠缠追踪每个待遗忘样本的表征漂移，并动态调整其遗忘力度。此外，我们引入了一个代理约束来近似模型在遗忘后的预期输出，形成软性正则化遗忘过程的参考边界。EGUP兼容现有的基于梯度的目标函数，可作为即插即用的增强模块。我们在TOFU和MUSE基准测试上评估EGUP，结果表明该方法在多种LLM上均能持续提升遗忘与效用的平衡性能。此外，EGUP在保持可扩展性和鲁棒性的同时，实现了接近重新训练模型的性能表现。