Large language models (LLMs) have achieved significant progress from pre-training on and memorizing a wide range of textual data, however, this process might suffer from privacy issues and violations of data protection regulations. As a result, the ability to easily remove data related to individual users from such models while not deteriorating their predictive quality after the removal becomes increasingly important. To address these issues, in this work, we propose an efficient unlearning framework that could efficiently update LLMs without having to retrain the whole model after data removals, by introducing lightweight unlearning layers learned with a selective teacher-student objective into the transformers. In addition, we introduce a fusion mechanism to effectively combine different unlearning layers that learns to forget different sets of data to handle a sequence of forgetting operations. Experiments on classification and generation tasks demonstrate the effectiveness of our proposed methods compared to the state-of-the-art baselines.
翻译:大型语言模型通过在广泛文本数据上的预训练与记忆取得了显著进展,然而这一过程可能引发隐私问题及违反数据保护法规的风险。因此,在移除数据后不降低模型预测质量的前提下,便捷地删除与特定用户相关的数据能力变得日益重要。针对上述问题,本文提出一种高效的遗忘学习框架,通过向Transformer架构中引入轻量级遗忘层(利用选择性师生学习目标进行训练),可在不重新训练整个模型的情况下高效更新大语言模型。此外,我们设计了一种融合机制,通过有效整合针对不同数据集进行遗忘学习的多个遗忘层,实现连续遗忘操作序列的处理。在分类与生成任务上的实验结果表明,与最先进基线方法相比,所提方法具有显著有效性。