Analysis of Convolutions, Non-linearity and Depth in Graph Neural Networks using Neural Tangent Kernel

The fundamental principle of Graph Neural Networks (GNNs) is to exploit the structural information of the data by aggregating the neighboring nodes using a `graph convolution' in conjunction with a suitable choice for the network architecture, such as depth and activation functions. Therefore, understanding the influence of each of the design choice on the network performance is crucial. Convolutions based on graph Laplacian have emerged as the dominant choice with the symmetric normalization of the adjacency matrix as the most widely adopted one. However, some empirical studies show that row normalization of the adjacency matrix outperforms it in node classification. Despite the widespread use of GNNs, there is no rigorous theoretical study on the representation power of these convolutions, that could explain this behavior. Similarly, the empirical observation of the linear GNNs performance being on par with non-linear ReLU GNNs lacks rigorous theory. In this work, we theoretically analyze the influence of different aspects of the GNN architecture using the Graph Neural Tangent Kernel in a semi-supervised node classification setting. Under the population Degree Corrected Stochastic Block Model, we prove that: (i) linear networks capture the class information as good as ReLU networks; (ii) row normalization preserves the underlying class structure better than other convolutions; (iii) performance degrades with network depth due to over-smoothing, but the loss in class information is the slowest in row normalization; (iv) skip connections retain the class information even at infinite depth, thereby eliminating over-smoothing. We finally validate our theoretical findings numerically and on real datasets such as Cora and Citeseer.

翻译：图神经网络（GNN）的基本原理是通过“图卷积”聚合邻域节点，并结合合适的网络架构（如深度和激活函数）来利用数据的结构信息。因此，理解每种设计选择对网络性能的影响至关重要。基于图拉普拉斯的卷积已成为主导选择，其中邻接矩阵的对称归一化是最广泛采用的方法。然而，一些实证研究表明，邻接矩阵的行归一化在节点分类中表现更优。尽管GNN被广泛使用，但目前缺乏关于这些卷积表示能力的严格理论研究来解释这一现象。类似地，线性GNN性能与非线性ReLU GNN性能相当的实证观察也缺乏严格的理论支持。在本工作中，我们利用图神经正切核在半监督节点分类设置下，从理论上分析了GNN架构不同方面的影响。在总体度修正随机块模型下，我们证明了：（i）线性网络能像ReLU网络一样捕获类别信息；（ii）行归一化比其他卷积更好地保留底层类别结构；（iii）由于过度平滑，网络性能随深度增加而下降，但行归一化中类别信息的损失最慢；（iv）跳跃连接即使在无限深度下也能保留类别信息，从而消除过度平滑。最后，我们通过数值实验以及在Cora和Citeseer等真实数据集上验证了我们的理论发现。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

生成性对抗网络:理论模型、评估指标和最近发展的概述，Generative Adversarial Networks (GANs): An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments

专知会员服务

42+阅读 · 2020年5月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日