This study investigates the concept of the `right to be forgotten' within the context of large language models (LLMs). We explore machine unlearning as a pivotal solution, with a focus on pre-trained models--a notably under-researched area. Our research delineates a comprehensive framework for machine unlearning in pre-trained LLMs, encompassing a critical analysis of seven diverse unlearning methods. Through rigorous evaluation using curated datasets from arXiv, books, and GitHub, we establish a robust benchmark for unlearning performance, demonstrating that these methods are over $10^5$ times more computationally efficient than retraining. Our results show that integrating gradient ascent with gradient descent on in-distribution data improves hyperparameter robustness. We also provide detailed guidelines for efficient hyperparameter tuning in the unlearning process. Our findings advance the discourse on ethical AI practices, offering substantive insights into the mechanics of machine unlearning for pre-trained LLMs and underscoring the potential for responsible AI development.
翻译:本研究探讨了大型语言模型(LLMs)语境下的“被遗忘权”概念。我们聚焦于预训练模型——这一尚未得到充分研究的领域——将机器遗忘作为关键解决方案。研究构建了预训练LLMs机器遗忘的综合框架,涵盖对七种不同遗忘方法的批判性分析。通过利用arXiv、书籍和GitHub上策划的数据集进行严格评估,我们建立了遗忘性能的稳健基准,证明这些方法的计算效率比重新训练高10^5倍以上。结果表明,在分布内数据上整合梯度上升与梯度下降可增强超参数鲁棒性。我们还提供了遗忘过程中高效超参数调优的详细指南。研究成果推进了关于伦理AI实践的讨论,为预训练LLMs的机器遗忘机制提供了实质性见解,并强调了负责任AI发展的潜力。