This study investigates the concept of the `right to be forgotten' within the context of large language models (LLMs). We explore machine unlearning as a pivotal solution, with a focus on pre-trained models--a notably under-researched area. Our research delineates a comprehensive framework for machine unlearning in pre-trained LLMs, encompassing a critical analysis of seven diverse unlearning methods. Through rigorous evaluation using curated datasets from arXiv, books, and GitHub, we establish a robust benchmark for unlearning performance, demonstrating that these methods are over $10^5$ times more computationally efficient than retraining. Our results show that integrating gradient ascent with gradient descent on in-distribution data improves hyperparameter robustness. We also provide detailed guidelines for efficient hyperparameter tuning in the unlearning process. Our findings advance the discourse on ethical AI practices, offering substantive insights into the mechanics of machine unlearning for pre-trained LLMs and underscoring the potential for responsible AI development.
翻译:本研究探讨了在大型语言模型(LLM)背景下"被遗忘权"的概念。我们将机器遗忘作为关键解决方案进行探索,重点关注预训练模型——这一尚未得到充分研究的领域。我们的研究为预训练LLM的机器遗忘构建了一个系统化框架,涵盖了对七种不同遗忘方法的批判性分析。通过使用来自arXiv、书籍和GitHub的精选数据集进行严格评估,我们建立了遗忘性能的稳健基准,证明这些方法的计算效率比重新训练模型高出超过10^5倍。我们的结果表明,将梯度上升与基于同分布数据的梯度下降相结合,能够提升超参数鲁棒性。我们还为遗忘过程中的高效超参数调优提供了详细指南。我们的研究成果推进了关于负责任AI实践的讨论,为预训练LLM的机器遗忘机制提供了实质性见解,并凸显了实现负责任AI开发的潜力。