This study investigates the concept of the `right to be forgotten' within the context of large language models (LLMs). We explore machine unlearning as a pivotal solution, with a focus on pre-trained models--a notably under-researched area. Our research delineates a comprehensive framework for machine unlearning in pre-trained LLMs, encompassing a critical analysis of seven diverse unlearning methods. Through rigorous evaluation using curated datasets from arXiv, books, and GitHub, we establish a robust benchmark for unlearning performance, demonstrating that these methods are over $10^5$ times more computationally efficient than retraining. Our results show that integrating gradient ascent with gradient descent on in-distribution data improves hyperparameter robustness. We also provide detailed guidelines for efficient hyperparameter tuning in the unlearning process. Our findings advance the discourse on ethical AI practices, offering substantive insights into the mechanics of machine unlearning for pre-trained LLMs and underscoring the potential for responsible AI development.
翻译:本研究探讨了大语言模型(LLMs)背景下的“被遗忘权”概念。我们重点研究了机器遗忘这一关键解决方案,并聚焦于预训练模型——这是一个明显研究不足的领域。我们的研究为预训练LLMs的机器遗忘构建了一个全面的框架,包含对七种不同遗忘方法的批判性分析。通过使用来自arXiv、书籍和GitHub的精选数据集进行严格评估,我们建立了一个稳健的遗忘性能基准,证明这些方法的计算效率比重新训练高出$10^5$倍以上。我们的结果表明,在分布内数据上结合梯度上升与梯度下降可以提高超参数鲁棒性。我们还为遗忘过程中高效的超参数调优提供了详细指导。我们的发现推动了关于人工智能伦理实践的讨论,为预训练LLMs的机器遗忘机制提供了实质性见解,并强调了负责任人工智能发展的潜力。