Large Language Models (LLMs) have transformed natural language processing by learning from massive datasets, yet this rapid progress has also drawn legal scrutiny, as the ability to unintentionally generate copyrighted content has already prompted several prominent lawsuits. In this work, we introduce SUV (Selective Unlearning for Verbatim data), a selective unlearning framework designed to prevent LLM from memorizing copyrighted content while preserving its overall utility. In detail, the proposed method constructs a dataset that captures instances of copyrighted infringement cases by the targeted LLM. With the dataset, we unlearn the content from the LLM by means of Direct Preference Optimization (DPO), which replaces the verbatim copyrighted content with plausible and coherent alternatives. Since DPO may hinder the LLM's performance in other unrelated tasks, we integrate gradient projection and Fisher information regularization to mitigate the degradation. We validate our approach using a large-scale dataset of 500 famous books (predominantly copyrighted works) and demonstrate that SUV significantly reduces verbatim memorization with negligible impact on the performance on unrelated tasks. Extensive experiments on both our dataset and public benchmarks confirm the scalability and efficacy of our approach, offering a promising solution for mitigating copyright risks in real-world LLM applications.
翻译:大型语言模型(LLMs)通过从海量数据集中学习,彻底改变了自然语言处理领域。然而,这一快速发展也引发了法律层面的审视,因为模型无意中生成受版权保护内容的能力已经引发了多起备受瞩目的诉讼。在本研究中,我们提出了SUV(针对逐字数据的选择性遗忘),这是一个选择性遗忘框架,旨在防止LLM记忆受版权保护的内容,同时保持其整体实用性。具体而言,所提出的方法构建了一个数据集,用于捕获目标LLM发生版权侵权案例的实例。利用该数据集,我们通过直接偏好优化(DPO)使LLM遗忘相关内容,即用合理且连贯的替代内容替换逐字抄袭的受版权保护内容。由于DPO可能会损害LLM在其他无关任务上的性能,我们整合了梯度投影和费舍尔信息正则化以减轻性能退化。我们使用一个包含500部著名书籍(主要为受版权保护作品)的大规模数据集验证了我们的方法,结果表明SUV显著减少了逐字记忆,同时对无关任务的性能影响微乎其微。在我们构建的数据集和公共基准测试上进行的大量实验证实了该方法的可扩展性和有效性,为缓解现实世界LLM应用中的版权风险提供了一个有前景的解决方案。