Machine unlearning is the process of efficiently removing specific information from a trained machine learning model without retraining from scratch. Existing unlearning methods, which often provide provable guarantees, typically involve retraining a subset of model parameters based on a forget set. While these approaches show promise in certain scenarios, their underlying assumptions are often challenged in real-world applications -- particularly when applied to generative models. Furthermore, updating parameters using these unlearning procedures often degrades the general-purpose capabilities the model acquired during pre-training. Motivated by these shortcomings, this paper considers the paradigm of inference time unlearning -- wherein, the generative model is equipped with an (approximately correct) verifier that judges whether the model's response satisfies appropriate unlearning guarantees. This paper introduces a framework that iteratively refines the quality of the generated responses using feedback from the verifier without updating the model parameters. The proposed framework leverages conformal prediction to reduce computational overhead and provide distribution-free unlearning guarantees. This paper's approach significantly outperforms existing state-of-the-art methods, reducing unlearning error by up to 93% across challenging unlearning benchmarks.
翻译:机器学习遗忘是指从已训练的机器学习模型中高效移除特定信息而无需从头重新训练的过程。现有的遗忘方法通常基于遗忘集对模型参数子集进行重训练,并提供可证明的保证。尽管这些方法在某些场景中展现出潜力,但其基本假设在实际应用中常面临挑战——特别是在生成模型中的应用。此外,使用这些遗忘流程更新参数往往会降低模型在预训练阶段获得的通用能力。针对这些不足,本文提出推理时遗忘范式——即为生成模型配备一个(近似正确的)验证器,用于判断模型响应是否满足适当的遗忘保证。本文引入了一个框架,该框架利用验证器的反馈迭代优化生成响应的质量,而无需更新模型参数。所提出的框架利用置信预测来降低计算开销,并提供无分布依赖的遗忘保证。该方法在具有挑战性的遗忘基准测试中显著优于现有最先进方法,将遗忘误差降低了最高达93%。