The right to be forgotten (RTBF) seeks to safeguard individuals from the enduring effects of their historical actions by implementing machine-learning techniques. These techniques facilitate the deletion of previously acquired knowledge without requiring extensive model retraining. However, they often overlook a critical issue: unlearning processes bias. This bias emerges from two main sources: (1) data-level bias, characterized by uneven data removal, and (2) algorithm-level bias, which leads to the contamination of the remaining dataset, thereby degrading model accuracy. In this work, we analyze the causal factors behind the unlearning process and mitigate biases at both data and algorithmic levels. Typically, we introduce an intervention-based approach, where knowledge to forget is erased with a debiased dataset. Besides, we guide the forgetting procedure by leveraging counterfactual examples, as they maintain semantic data consistency without hurting performance on the remaining dataset. Experimental results demonstrate that our method outperforms existing machine unlearning baselines on evaluation metrics.
翻译:被遗忘权旨在通过机器学习技术保护个体免受历史行为的长期影响。这类技术能在无需全面重新训练模型的情况下移除已习得知识,但常忽视一个关键问题:遗忘过程存在偏差。该偏差源于两大原因:(1) 数据级偏差,表现为数据移除不均衡性;(2) 算法级偏差,导致剩余数据集受污染进而降低模型准确率。本文分析了遗忘过程的因果机制,并在数据与算法两个层面缓解偏差。具体而言,我们提出基于干预的方法,利用去偏数据集消除待遗忘知识。此外,通过反事实样本引导遗忘过程,在维持语义数据一致性的同时不影响剩余数据集性能。实验结果表明,本方法在评估指标上优于现有机器遗忘基线模型。