Graph Neural Networks (GNNs) have displayed considerable promise in graph representation learning across various applications. The core learning process requires the initialization of model weight matrices within each GNN layer, which is typically accomplished via classic initialization methods such as Xavier initialization. However, these methods were originally motivated to stabilize the variance of hidden embeddings and gradients across layers of Feedforward Neural Networks (FNNs) and Convolutional Neural Networks (CNNs) to avoid vanishing gradients and maintain steady information flow. In contrast, within the GNN context classical initializations disregard the impact of the input graph structure and message passing on variance. In this paper, we analyze the variance of forward and backward propagation across GNN layers and show that the variance instability of GNN initializations comes from the combined effect of the activation function, hidden dimension, graph structure and message passing. To better account for these influence factors, we propose a new initialization method for Variance Instability Reduction within GNN Optimization (Virgo), which naturally tends to equate forward and backward variances across successive layers. We conduct comprehensive experiments on 15 datasets to show that Virgo can lead to superior model performance and more stable variance at initialization on node classification, link prediction and graph classification tasks. Codes are in https://github.com/LspongebobJH/virgo_icml2023.
翻译:图神经网络(GNNs)在各类应用的图表示学习中展现出显著潜力。其核心学习过程需要初始化每个GNN层中的模型权重矩阵,通常通过经典初始化方法(如Xavier初始化)实现。然而,这些方法最初旨在稳定前馈神经网络(FNNs)和卷积神经网络(CNNs)中隐藏嵌入和梯度在各层间的方差,以避免梯度消失并维持稳定的信息流。相比之下,在GNN场景中,经典初始化忽视了输入图结构及消息传递对方差的影响。本文分析了前向传播和后向传播在GNN各层的方差,表明GNN初始化的方差不稳定性源于激活函数、隐藏维度、图结构和消息传递的综合效应。为更好考虑这些影响因素,我们提出了一种新的初始化方法——面向GNN优化方差不稳定性降低(Virgo),该方法能自然均衡相邻层的前向和后向方差。我们在15个数据集上进行了全面实验,证明Virgo能在节点分类、链接预测和图分类任务中实现更优的模型性能,以及初始状态下更稳定的方差。代码见https://github.com/LspongebobJH/virgo_icml2023。