The widespread popularity of Large Language Models (LLMs), partly due to their unique ability to perform in-context learning, has also brought to light the importance of ethical and safety considerations when deploying these pre-trained models. In this work, we focus on investigating machine unlearning for LLMs motivated by data protection regulations. In contrast to the growing literature on fine-tuning methods to achieve unlearning, we focus on a comparatively lightweight alternative called soft prompting to realize the unlearning of a subset of training data. With losses designed to enforce forgetting as well as utility preservation, our framework \textbf{S}oft \textbf{P}rompting for \textbf{U}n\textbf{l}earning (SPUL) learns prompt tokens that can be appended to an arbitrary query to induce unlearning of specific examples at inference time without updating LLM parameters. We conduct a rigorous evaluation of the proposed method and our results indicate that SPUL can significantly improve the trade-off between utility and forgetting in the context of text classification and question answering with LLMs. We further validate our method using multiple LLMs to highlight the scalability of our framework and provide detailed insights into the choice of hyperparameters and the influence of the size of unlearning data. Our implementation is available at \url{https://github.com/karuna-bhaila/llm_unlearning}.
翻译:大型语言模型(LLM)的广泛流行,部分得益于其独特的上下文学习能力,同时也凸显了部署这些预训练模型时伦理与安全考量的重要性。本研究聚焦于受数据保护法规驱动的LLM机器遗忘问题。与日益增多的通过微调方法实现遗忘的研究不同,我们关注一种相对轻量化的替代方案——软提示,以实现对训练数据子集的遗忘。通过设计损失函数以强制遗忘并保持模型效用,我们提出的**S**oft **P**rompting for **U**n**l**earning(SPUL)框架学习得到提示词元,这些词元可在推理时附加到任意查询中,以诱导模型遗忘特定示例,而无需更新LLM参数。我们对所提方法进行了严格评估,结果表明在文本分类和问答任务中,SPUL能显著改善模型效用与遗忘效果之间的权衡。我们进一步使用多种LLM验证了该方法,以凸显框架的可扩展性,并深入分析了超参数选择及遗忘数据规模的影响。代码实现已发布于\url{https://github.com/karuna-bhaila/llm_unlearning}。