ADAGE: Active Defenses Against GNN Extraction

Graph Neural Networks (GNNs) achieve high performance in various real-world applications, such as drug discovery, traffic states prediction, and recommendation systems. The fact that building powerful GNNs requires a large amount of training data, powerful computing resources, and human expertise turns the models into lucrative targets for model stealing attacks. Prior work has revealed that the threat vector of stealing attacks against GNNs is large and diverse, as an attacker can leverage various heterogeneous signals ranging from node labels to high-dimensional node embeddings to create a local copy of the target GNN at a fraction of the original training costs. This diversity in the threat vector renders the design of effective and general defenses challenging and existing defenses usually focus on one particular stealing setup. Additionally, they solely provide means to identify stolen model copies rather than preventing the attack. To close this gap, we propose the first and general Active Defense Against GNN Extraction (ADAGE). By analyzing the queries to the GNN, tracking their diversity in terms of proximity to different communities identified in the underlying graph, and increasing the defense strength with the growing fraction of communities that have been queried, ADAGE can prevent stealing in all common attack setups. Our extensive experimental evaluation using six benchmark datasets, four GNN models, and three types of adaptive attackers shows that ADAGE penalizes attackers to the degree of rendering stealing impossible, whilst not harming predictive performance for legitimate users. ADAGE, thereby, contributes towards securely sharing valuable GNNs in the future.

翻译：图神经网络（Graph Neural Networks, GNNs）在药物发现、交通状态预测和推荐系统等多种实际应用中表现出卓越性能。由于构建强大的GNN需要大量训练数据、强大的计算资源以及专业知识，这些模型已成为模型窃取攻击的高价值目标。先前研究揭示，针对GNN的窃取攻击威胁向量广泛且多样，攻击者可以利用从节点标签到高维节点嵌入等多种异构信号，以远低于原始训练成本的代价创建目标GNN的本地副本。威胁向量的多样性使得设计有效且通用的防御机制极具挑战性，现有防御方法通常仅针对特定窃取场景，且仅提供识别被盗模型副本的手段，而无法主动阻止攻击。为弥补这一不足，我们提出了首个通用型“针对图神经网络提取的主动防御”（Active Defense Against GNN Extraction, ADAGE）。该方法通过分析对GNN的查询请求，追踪其在底层图结构中不同社区邻近度层面的多样性，并随着被查询社区比例的增加而提升防御强度，从而能在所有常见攻击场景下有效阻止窃取行为。我们基于六个基准数据集、四种GNN模型和三类自适应攻击者开展的广泛实验评估表明，ADAGE能对攻击者施加足以使窃取行为失效的惩罚，同时不影响合法用户的预测性能。ADAGE由此为未来安全共享高价值GNN模型提供了重要支撑。