Oversmoothing has been claimed as a primary bottleneck for multi-layered graph neural networks (GNNs). Multiple analyses have examined how and why oversmoothing occurs. However, none of the prior work addressed how optimization is performed under the oversmoothing regime. In this work, we show the presence of $\textit{gradient oversmoothing}$ preventing optimization during training. We further analyze that GNNs with residual connections, a well-known solution to help gradient flow in deep architecture, introduce $\textit{gradient expansion}$, a phenomenon of the gradient explosion in diverse directions. Therefore, adding residual connections cannot be a solution for making a GNN deep. Our analysis reveals that constraining the Lipschitz bound of each layer can neutralize the gradient expansion. To this end, we provide a simple yet effective normalization method to prevent the gradient expansion. An empirical study shows that the residual GNNs with hundreds of layers can be efficiently trained with the proposed normalization without compromising performance. Additional studies show that the empirical observations corroborate our theoretical analysis.
翻译:过度平滑已被认为是多层图神经网络(GNNs)的主要瓶颈。已有多种分析探讨了过度平滑如何发生及其原因。然而,先前的研究均未涉及在过度平滑状态下如何进行优化。在本工作中,我们揭示了训练过程中存在阻碍优化的$\textit{梯度过度平滑}$现象。我们进一步分析发现,采用残差连接(一种已知的促进深度架构中梯度流动的解决方案)的GNNs会引发$\textit{梯度扩张}$,即梯度在多个方向上爆炸的现象。因此,添加残差连接并不能成为构建深度GNN的解决方案。我们的分析表明,约束每一层的Lipschitz界可以中和梯度扩张。为此,我们提出了一种简单而有效的归一化方法来防止梯度扩张。实证研究表明,采用所提归一化方法后,具有数百层的残差GNNs能够被高效训练且不损失性能。补充研究证实,实证观察结果与我们的理论分析相符。