In current AI era, users may request AI companies to delete their data from the training dataset due to the privacy concerns. As a model owner, retraining a model will consume significant computational resources. Therefore, machine unlearning is a new emerged technology to allow model owner to delete requested training data or a class with little affecting on the model performance. However, for large-scaling complex data, such as image or text data, unlearning a class from a model leads to a inferior performance due to the difficulty to identify the link between classes and model. An inaccurate class deleting may lead to over or under unlearning. In this paper, to accurately defining the unlearning class of complex data, we apply the definition of Concept, rather than an image feature or a token of text data, to represent the semantic information of unlearning class. This new representation can cut the link between the model and the class, leading to a complete erasing of the impact of a class. To analyze the impact of the concept of complex data, we adopt a Post-hoc Concept Bottleneck Model, and Integrated Gradients to precisely identify concepts across different classes. Next, we take advantage of data poisoning with random and targeted labels to propose unlearning methods. We test our methods on both image classification models and large language models (LLMs). The results consistently show that the proposed methods can accurately erase targeted information from models and can largely maintain the performance of the models.
翻译:在当前人工智能时代,由于隐私考量,用户可能要求AI公司将其数据从训练数据集中删除。对模型所有者而言,重新训练模型将消耗大量计算资源。因此,机器遗忘作为一种新兴技术,允许模型所有者在几乎不影响模型性能的前提下删除指定的训练数据或类别。然而,对于大规模复杂数据(如图像或文本数据),由于难以识别类别与模型之间的关联,从模型中遗忘一个类别会导致性能下降。不准确的类别删除可能导致过度遗忘或遗忘不足。本文为精确定义复杂数据的遗忘类别,采用"概念"的定义——而非图像特征或文本数据的词元——来表示遗忘类别的语义信息。这种新的表征方式能够切断模型与类别之间的关联,从而实现对该类别影响的彻底消除。为分析复杂数据概念的影响,我们采用事后概念瓶颈模型和积分梯度方法,以精确识别不同类别间的概念。随后,我们利用随机标签与定向标签的数据投毒技术,提出了相应的遗忘方法。我们在图像分类模型和大语言模型上对所提方法进行了测试。结果一致表明,所提方法能够准确擦除模型中的目标信息,并能在很大程度上保持模型的性能。