Large language models are trained on massive corpora of web data, which may include private data, copyrighted material, factually inaccurate data, or data that degrades model performance. Eliminating the influence of such problematic datapoints on a model through complete retraining -- by repeatedly pretraining the model on datasets that exclude these specific instances -- is computationally prohibitive. To address this, unlearning algorithms have been proposed, that aim to eliminate the influence of particular datapoints at a low computational cost, while leaving the rest of the model intact. However, precisely unlearning the influence of data on a large language model has proven to be a major challenge. In this work, we propose a new algorithm, MSA (Model State Arithmetic), for unlearning datapoints in large language models. MSA utilizes prior model checkpoints -- artifacts that record model states at different stages of pretraining -- to estimate and counteract the effect of targeted datapoints. Our experimental results show that MSA achieves competitive performance and often outperforms existing machine unlearning algorithms across multiple benchmarks, models, and evaluation metrics, suggesting that MSA could be an effective approach towards more flexible large language models that are capable of data erasure.
翻译:大型语言模型在包含海量网页数据的语料库上进行训练,这些数据可能涉及隐私信息、受版权保护的内容、事实性错误数据或导致模型性能下降的数据。通过完全重新训练(即在排除特定数据实例的数据集上反复预训练模型)来消除此类问题数据点对模型的影响,其计算成本是难以承受的。为解决这一问题,研究者提出了遗忘算法,旨在以较低计算代价消除特定数据点的影响,同时保持模型其余部分不变。然而,精确消除数据对大型语言模型的影响已被证明是一项重大挑战。本研究提出一种新算法——模型状态算术(MSA),用于实现大型语言模型中的数据点遗忘。MSA利用先前的模型检查点(记录预训练不同阶段模型状态的存档文件)来估计并抵消目标数据点的影响。实验结果表明,MSA在多个基准测试、模型和评估指标上均取得具有竞争力的性能,且往往优于现有机器学习遗忘算法,这表明MSA可能成为实现具备数据擦除能力的更灵活大型语言模型的有效途径。