Recent research has shown that language models have a tendency to memorize rare or unique token sequences in the training corpus. After deploying a model, practitioners might be asked to delete any personal information from the model by individuals' requests. Re-training the underlying model every time individuals would like to practice their rights to be forgotten is computationally expensive. We employ a teacher-student framework and propose a novel leave-one-out ensemble method to unlearn the targeted textual sequences that need to be forgotten from the model. In our approach, multiple teachers are trained on disjoint sets; for each targeted sequence to be removed, we exclude the teacher trained on the set containing this sequence and aggregate the predictions from remaining teachers to provide supervision during fine-tuning. Experiments on LibriSpeech and WikiText-103 datasets show that the proposed method achieves superior privacy-utility trade-offs than other counterparts.
翻译:近期研究表明,语言模型倾向于记忆训练语料中的稀有或唯一标记序列。模型部署后,实践者可能因个人请求需要从模型中删除各类隐私信息。每次个体行使被遗忘权时重新训练基础模型,其计算成本过于高昂。本研究采用师生框架,提出一种新颖的留一法集成方法,用于从模型中遗忘需要删除的目标文本序列。该方法基于不相交数据集训练多个教师模型;针对每个待移除的目标序列,排除包含该序列的教师模型,并聚合其余教师模型的预测结果以提供微调阶段的监督信号。在LibriSpeech和WikiText-103数据集上的实验表明,所提方法在隐私-效用权衡方面优于其他同类方法。