ADAGE: Active Defenses Against GNN Extraction

Graph Neural Networks (GNNs) achieve high performance in various real-world applications, such as drug discovery, traffic states prediction, and recommendation systems. The fact that building powerful GNNs requires a large amount of training data, powerful computing resources, and human expertise turns the models into lucrative targets for model stealing attacks. Prior work has revealed that the threat vector of stealing attacks against GNNs is large and diverse, as an attacker can leverage various heterogeneous signals ranging from node labels to high-dimensional node embeddings to create a local copy of the target GNN at a fraction of the original training costs. This diversity in the threat vector renders the design of effective and general defenses challenging and existing defenses usually focus on one particular stealing setup. Additionally, they solely provide means to identify stolen model copies rather than preventing the attack. To close this gap, we propose the first and general Active Defense Against GNN Extraction (ADAGE). ADAGE builds on the observation that stealing a model's full functionality requires highly diverse queries to leak its behavior across the input space. Our defense monitors this query diversity and progressively perturbs outputs as the accumulated leakage grows. In contrast to prior work, ADAGE can prevent stealing across all common attack setups. Our extensive experimental evaluation using six benchmark datasets, four GNN models, and three types of adaptive attackers shows that ADAGE penalizes attackers to the degree of rendering stealing impossible, whilst preserving predictive performance on downstream tasks. ADAGE, thereby, contributes towards securely sharing valuable GNNs in the future.

翻译：图神经网络（GNN）在药物发现、交通状态预测和推荐系统等各类现实应用中取得了高性能。构建强大GNN需要大量训练数据、强大计算资源和人类专业知识，这一事实使模型成为模型窃取攻击的有利可图目标。先前研究表明，针对GNN的窃取攻击威胁向量广泛且多样，攻击者可利用从节点标签到高维节点嵌入等各种异构信号，以原始训练成本的一小部分创建目标GNN的本地副本。威胁向量的这种多样性使得设计有效且通用的防御策略充满挑战，现有防御通常专注于特定窃取场景，且仅提供识别被盗模型副本而非阻止攻击的手段。为弥补这一空白，我们提出了首个通用型GNN窃取主动防御方法（ADAGE）。ADAGE基于以下观察：窃取模型的完整功能需要高度多样化的查询以泄露其在输入空间中的行为。我们的防御机制监控这种查询多样性，并随着累积泄露的增加逐步扰动输出。与先前工作相比，ADAGE可在所有常见攻击场景中阻止窃取。我们使用六个基准数据集、四种GNN模型和三种自适应攻击者进行的广泛实验评估表明，ADAGE能将攻击者惩罚至窃取不可行的程度，同时保留下游任务的预测性能。ADAGE由此为未来安全共享宝贵GNN做出了贡献。