Machine unlearning, which aims to efficiently remove the influence of specific data from trained models, is crucial for upholding data privacy regulations like the ``right to be forgotten". However, existing research predominantly evaluates unlearning methods on relatively balanced forget sets. This overlooks a common real-world scenario where data to be forgotten, such as a user's activity records, follows a long-tailed distribution. Our work is the first to investigate this critical research gap. We find that in such long-tailed settings, existing methods suffer from two key issues: \textit{Heterogeneous Unlearning Deviation} and \textit{Skewed Unlearning Deviation}. To address these challenges, we propose FaLW, a plug-and-play, instance-wise dynamic loss reweighting method. FaLW innovatively assesses the unlearning state of each sample by comparing its predictive probability to the distribution of unseen data from the same class. Based on this, it uses a forgetting-aware reweighting scheme, modulated by a balancing factor, to adaptively adjust the unlearning intensity for each sample. Extensive experiments demonstrate that FaLW achieves superior performance. Code is available at \textbf{Supplementary Material}.
翻译:机器遗忘旨在高效地从训练好的模型中移除特定数据的影响,这对于维护“被遗忘权”等数据隐私法规至关重要。然而,现有研究主要在相对平衡的遗忘集上评估遗忘方法,忽视了待遗忘数据(如用户活动记录)常遵循长尾分布这一常见现实场景。我们的工作首次探究了这一关键研究空白。我们发现,在此类长尾场景下,现有方法存在两个关键问题:\textit{异构遗忘偏差}与\textit{倾斜遗忘偏差}。为应对这些挑战,我们提出FaLW,一种即插即用、基于实例的动态损失重加权方法。FaLW创新性地通过比较每个样本的预测概率与同类未见数据的分布来评估其遗忘状态。基于此,它采用一种由平衡因子调制的遗忘感知重加权方案,自适应地调整每个样本的遗忘强度。大量实验表明,FaLW实现了卓越的性能。代码见\textbf{补充材料}。