Generalization of machine learning models can be severely compromised by data poisoning, where adversarial changes are applied to the training data, as well as backdoor attacks that additionally manipulate the test data. These vulnerabilities have led to interest in certifying (i.e., proving) that such changes up to a certain magnitude do not affect test predictions. We, for the first time, certify Graph Neural Networks (GNNs) against poisoning and backdoor attacks targeting the node features of a given graph. Our certificates are white-box and based upon $(i)$ the neural tangent kernel, which characterizes the training dynamics of sufficiently wide networks; and $(ii)$ a novel reformulation of the bilevel optimization problem describing poisoning as a mixed-integer linear program. Consequently, we leverage our framework to provide fundamental insights into the role of graph structure and its connectivity on the worst-case robustness behavior of convolution-based and PageRank-based GNNs. We note that our framework is more general and constitutes the first approach to derive white-box poisoning certificates for NNs, which can be of independent interest beyond graph-related tasks.
翻译:机器学习模型的泛化能力可能因数据投毒(即对训练数据施加对抗性修改)以及进一步操纵测试数据的后门攻击而受到严重损害。这些脆弱性促使研究者关注如何对特定幅度内的此类修改不影响测试预测结果进行认证(即证明)。本研究首次针对针对给定图节点特征的数据投毒与后门攻击,为图神经网络(GNNs)提供可证明的认证保护。我们的认证方法基于白盒设定,其理论依据来源于:(i)神经正切核——该工具可刻画足够宽网络的训练动态特性;(ii)将描述投毒问题的双层优化问题重构为混合整数线性规划的新颖方法。基于此框架,我们深入揭示了图结构及其连通性对基于卷积和基于PageRank的GNN在最坏情况下鲁棒性行为中的根本性影响。需要指出的是,本框架具有更广泛的适用性,构成了首个为神经网络推导白盒投毒认证的方法体系,该成果在图相关任务之外亦具有独立的研究价值。