We analyze the performance of graph neural network (GNN) architectures from the perspective of random graph theory. Our approach promises to complement existing lenses on GNN analysis, such as combinatorial expressive power and worst-case adversarial analysis, by connecting the performance of GNNs to typical-case properties of the training data. First, we theoretically characterize the nodewise accuracy of one- and two-layer GCNs relative to the contextual stochastic block model (cSBM) and related models. We additionally prove that GCNs cannot beat linear models under certain circumstances. Second, we numerically map the recoverability thresholds, in terms of accuracy, of four diverse GNN architectures (GCN, GAT, SAGE, and Graph Transformer) under a variety of assumptions about the data. Sample results of this second analysis include: heavy-tailed degree distributions enhance GNN performance, GNNs can work well on strongly heterophilous graphs, and SAGE and Graph Transformer can perform well on arbitrarily noisy edge data, but no architecture handled sufficiently noisy feature data well. Finally, we show how both specific higher-order structures in synthetic data and the mix of empirical structures in real data have dramatic effects (usually negative) on GNN performance.
翻译:我们从随机图理论的角度分析了图神经网络(GNN)架构的性能。通过将GNN性能与训练数据的典型案例属性联系起来,本方法有望补充现有的GNN分析视角(如组合表达能力和最坏情况对抗分析)。首先,我们从理论上刻画了单层和双层图卷积网络(GCN)相对于上下文随机块模型(cSBM)及相关模型的节点级准确率,并进一步证明在某些条件下GCN无法超越线性模型。其次,我们通过数值方法映射了四种不同GNN架构(GCN、GAT、SAGE和Graph Transformer)在多种数据假设下的可恢复性阈值(以准确率衡量)。该分析的示例结果包括:重尾度分布可提升GNN性能,GNN能在强异配图上表现良好,SAGE和Graph Transformer对任意噪声的边数据具有良好适应性,但尚无架构能有效处理足够噪声的特征数据。最后,我们展示了合成数据中的特定高阶结构以及真实数据中经验结构的混合如何对GNN性能产生显著(通常为负面)影响。