Threats, Attacks, and Defenses in Machine Unlearning: A Survey

Machine Unlearning (MU) has gained considerable attention recently for its potential to achieve Safe AI by removing the influence of specific data from trained machine learning models. This process, known as knowledge removal, addresses AI governance concerns of training data such as quality, sensitivity, copyright restrictions, and obsolescence. This capability is also crucial for ensuring compliance with privacy regulations such as the Right To Be Forgotten. Furthermore, effective knowledge removal mitigates the risk of harmful outcomes, safeguarding against biases, misinformation, and unauthorized data exploitation, thereby enhancing the safe and responsible use of AI systems. Efforts have been made to design efficient unlearning approaches, with MU services being examined for integration with existing machine learning as a service, allowing users to submit requests to remove specific data from the training corpus. However, recent research highlights vulnerabilities in machine unlearning systems, such as information leakage and malicious unlearning requests, that can lead to significant security and privacy concerns. Moreover, extensive research indicates that unlearning methods and prevalent attacks fulfill diverse roles within MU systems. For instance, unlearning can act as a mechanism to recover models from backdoor attacks, while backdoor attacks themselves can serve as an evaluation metric for unlearning effectiveness. This underscores the intricate relationship and complex interplay among these mechanisms in maintaining system functionality and safety. This survey aims to fill the gap between the extensive number of studies on threats, attacks, and defenses in machine unlearning and the absence of a comprehensive review that categorizes their taxonomy, methods, and solutions, thus offering valuable insights for future research directions and practical implementations.

翻译：机器遗忘（Machine Unlearning, MU）因其通过移除已训练机器学习模型中特定数据的影响以实现安全人工智能（Safe AI）的潜力而备受关注。这一被称为知识移除的过程，解决了AI治理中关于训练数据质量、敏感性、版权限制及过时性等关键问题，同时对保障《被遗忘权》等隐私合规要求至关重要。此外，有效的知识移除可降低有害输出风险，防范偏见、虚假信息及未授权数据滥用，从而增强AI系统安全且负责任的应用。目前已有研究致力于设计高效的遗忘方法，并探索将MU服务集成至现有机器学习即服务（MLaaS）框架中，使用户能够提交移除训练语料库中特定数据的请求。然而，近期研究揭示了机器遗忘系统的脆弱性，例如信息泄露与恶意遗忘请求，可能引发严重的安全与隐私问题。更深入的研究表明，遗忘方法与常见攻击在MU系统中扮演多重角色：例如，遗忘可作为从后门攻击中恢复模型的机制，而后门攻击本身也可作为评估遗忘有效性的指标。这凸显了这些机制在维护系统功能与安全性之间错综复杂的关联与交互作用。本综述旨在弥合机器遗忘领域中关于威胁、攻击与防御的大量研究与其缺乏系统分类（涵盖分类体系、方法与解决方案）的综合性述评之间的空白，从而为未来研究方向与实践应用提供宝贵见解。

相关内容