Machine unlearning is an emerging technique that aims to remove the influence of specific data from trained models, thereby enhancing privacy protection. However, recent research has uncovered critical privacy vulnerabilities, showing that adversaries can exploit unlearning inversion to reconstruct data that was intended to be erased. Despite the severity of this threat, dedicated defenses remain lacking. To address this gap, we propose UnlearnShield, the first defense specifically tailored to counter unlearning inversion. UnlearnShield introduces directional perturbations in the cosine representation space and regulates them through a constraint module to jointly preserve model accuracy and forgetting efficacy, thereby reducing inversion risk while maintaining utility. Experiments demonstrate that it achieves a good trade-off among privacy protection, accuracy, and forgetting.
翻译:机器遗忘是一种新兴技术,旨在从已训练的模型中移除特定数据的影响,从而增强隐私保护。然而,近期研究揭示了关键的隐私漏洞,表明攻击者可以利用遗忘逆推技术来重建本应被抹除的数据。尽管这一威胁十分严重,但目前仍缺乏专门的防御机制。为填补这一空白,我们提出了UnlearnShield,这是首个专门设计用于对抗遗忘逆推的防御方案。UnlearnShield在余弦表示空间中引入定向扰动,并通过约束模块对其进行调控,以协同保持模型精度与遗忘效果,从而在维持模型效用的同时降低逆推风险。实验表明,该方法在隐私保护、模型精度与遗忘效果之间实现了良好的权衡。