AHSG: Adversarial Attacks on High-level Semantics in Graph Neural Networks

Graph Neural Networks (GNNs) have garnered significant interest among researchers due to their impressive performance in graph learning tasks. However, like other deep neural networks, GNNs are also vulnerable to adversarial attacks. In existing adversarial attack methods for GNNs, the metric between the attacked graph and the original graph is usually the attack budget or a measure of global graph properties. However, we have found that it is possible to generate attack graphs that disrupt the primary semantics even within these constraints. To address this problem, we propose a Adversarial Attacks on High-level Semantics in Graph Neural Networks (AHSG), which is a graph structure attack model that ensures the retention of primary semantics. The latent representations of each node can extract rich semantic information by applying convolutional operations on graph data. These representations contain both task-relevant primary semantic information and task-irrelevant secondary semantic information. The latent representations of same-class nodes with the same primary semantics can fulfill the objective of modifying secondary semantics while preserving the primary semantics. Finally, the latent representations with attack effects is mapped to an attack graph using Projected Gradient Descent (PGD) algorithm. By attacking graph deep learning models with some advanced defense strategies, we validate that AHSG has superior attack effectiveness compared to other attack methods. Additionally, we employ Contextual Stochastic Block Models (CSBMs) as a proxy for the primary semantics to detect the attacked graph, confirming that AHSG almost does not disrupt the original primary semantics of the graph.

翻译：图神经网络（GNNs）因其在图学习任务中的卓越表现而受到研究者的广泛关注。然而，与其他深度神经网络类似，GNNs同样易受对抗攻击的影响。在现有的GNN对抗攻击方法中，受攻击图与原始图之间的度量通常采用攻击预算或全局图属性的指标。但我们发现，即使在这些约束条件下，仍可能生成破坏主要语义的攻击图。为解决这一问题，我们提出了一种针对图神经网络高层语义的对抗攻击方法（AHSG），这是一种确保主要语义得以保留的图结构攻击模型。通过对图数据应用卷积操作，每个节点的潜在表示能够提取丰富的语义信息。这些表示既包含任务相关的主要语义信息，也包含任务无关的次要语义信息。具有相同主要语义的同类别节点，其潜在表示能够实现在保留主要语义的同时修改次要语义的目标。最后，通过投影梯度下降（PGD）算法将具有攻击效应的潜在表示映射为攻击图。通过攻击具备先进防御策略的图深度学习模型，我们验证了AHSG相较于其他攻击方法具有更优的攻击效能。此外，我们采用上下文随机块模型（CSBMs）作为主要语义的代理来检测受攻击图，证实AHSG几乎不会破坏图的原始主要语义。