Machine unlearning requires removing the information of forgetting data while keeping the necessary information of remaining data. Despite recent advancements in this area, existing methodologies mainly focus on the effect of removing forgetting data without considering the negative impact this can have on the information of the remaining data, resulting in significant performance degradation after data removal. Although some methods try to repair the performance of remaining data after removal, the forgotten information can also return after repair. Such an issue is due to the intricate intertwining of the forgetting and remaining data. Without adequately differentiating the influence of these two kinds of data on the model, existing algorithms take the risk of either inadequate removal of the forgetting data or unnecessary loss of valuable information from the remaining data. To address this shortcoming, the present study undertakes a causal analysis of the unlearning and introduces a novel framework termed Causal Machine Unlearning (CaMU). This framework adds intervention on the information of remaining data to disentangle the causal effects between forgetting data and remaining data. Then CaMU eliminates the causal impact associated with forgetting data while concurrently preserving the causal relevance of the remaining data. Comprehensive empirical results on various datasets and models suggest that CaMU enhances performance on the remaining data and effectively minimizes the influences of forgetting data. Notably, this work is the first to interpret deep model unlearning tasks from a new perspective of causality and provide a solution based on causal analysis, which opens up new possibilities for future research in deep model unlearning.
翻译:机器去学习要求移除遗忘数据的信息,同时保留剩余数据的必要信息。尽管该领域近期取得进展,现有方法主要关注移除遗忘数据的效果,却未考虑这一过程可能对剩余数据信息造成的负面影响,导致数据移除后模型性能显著下降。部分方法尝试修复移除后剩余数据的性能,但修复过程中被遗忘的信息也可能随之恢复。此类问题源于遗忘数据与剩余数据的复杂交织。由于未能充分区分这两类数据对模型的影响,现有算法面临两种风险:要么遗忘数据未被充分移除,要么剩余数据中的有价值信息被不必要地丢失。为解决这一缺陷,本研究从因果分析角度审视去学习任务,提出了一种名为因果机器去学习(Causal Machine Unlearning, CaMU)的新框架。该框架通过对剩余数据信息施加干预,解耦遗忘数据与剩余数据之间的因果效应,进而消除与遗忘数据相关的因果影响,同时保留剩余数据的因果关联性。在多种数据集和模型上的综合实验结果表明,CaMU 能提升剩余数据的性能,并有效最小化遗忘数据的影响。值得注意的是,本研究首次从因果关系的新视角阐释深度模型去学习任务,并基于因果分析提供了解决方案,为未来深度模型去学习研究开辟了新可能性。