Attack by Unlearning: Unlearning-Induced Adversarial Attacks on Graph Neural Networks

Graph neural networks (GNNs) are widely used for learning from graph-structured data in domains such as social networks, recommender systems, and financial platforms. To comply with privacy regulations like the GDPR, CCPA, and PIPEDA, approximate graph unlearning, which aims to remove the influence of specific data points from trained models without full retraining, has become an increasingly important component of trustworthy graph learning. However, approximate unlearning often incurs subtle performance degradation, which may incur negative and unintended side effects. In this work, we show that such degradations can be amplified into adversarial attacks. We introduce the notion of \textbf{unlearning corruption attacks}, where an adversary injects carefully chosen nodes into the training graph and later requests their deletion. Because deletion requests are legally mandated and cannot be denied, this attack surface is both unavoidable and stealthy: the model performs normally during training, but accuracy collapses only after unlearning is applied. Technically, we formulate this attack as a bi-level optimization problem: to overcome the challenges of black-box unlearning and label scarcity, we approximate the unlearning process via gradient-based updates and employ a surrogate model to generate pseudo-labels for the optimization. Extensive experiments across benchmarks and unlearning algorithms demonstrate that small, carefully designed unlearning requests can induce significant accuracy degradation, raising urgent concerns about the robustness of GNN unlearning under real-world regulatory demands. The source code will be released upon paper acceptance.

翻译：图神经网络（GNN）广泛应用于社交网络、推荐系统和金融平台等领域的图结构数据学习。为遵守GDPR、CCPA和PIPEDA等隐私法规，近似图遗忘（旨在无需完全重新训练即可移除特定数据点对已训练模型影响的技术）已成为可信图学习中日益重要的组成部分。然而，近似遗忘常导致微妙的性能退化，可能引发负面且非预期的副作用。本研究证明，此类退化可被放大为对抗攻击。我们提出**遗忘污染攻击**的概念：攻击者向训练图中注入精心选择的节点，随后请求删除这些节点。由于删除请求受法律强制且不可拒绝，该攻击面既不可避免又难以察觉——模型在训练期间表现正常，但仅在执行遗忘后准确率骤降。在技术层面，我们将该攻击建模为双层优化问题：为克服黑盒遗忘和标签稀缺的挑战，我们通过基于梯度的更新近似遗忘过程，并利用代理模型生成伪标签以支持优化。跨基准测试和遗忘算法的广泛实验表明，精心设计的小规模遗忘请求可引发显著的准确率下降，这迫切要求关注真实世界监管需求下GNN遗忘的鲁棒性问题。源代码将在论文接收后公开发布。