Oversmoothing is a central challenge of building more powerful Graph Neural Networks (GNNs). While previous works have only demonstrated that oversmoothing is inevitable when the number of graph convolutions tends to infinity, in this paper, we precisely characterize the mechanism behind the phenomenon via a non-asymptotic analysis. Specifically, we distinguish between two different effects when applying graph convolutions -- an undesirable mixing effect that homogenizes node representations in different classes, and a desirable denoising effect that homogenizes node representations in the same class. By quantifying these two effects on random graphs sampled from the Contextual Stochastic Block Model (CSBM), we show that oversmoothing happens once the mixing effect starts to dominate the denoising effect, and the number of layers required for this transition is $O(\log N/\log (\log N))$ for sufficiently dense graphs with $N$ nodes. We also extend our analysis to study the effects of Personalized PageRank (PPR), or equivalently, the effects of initial residual connections on oversmoothing. Our results suggest that while PPR mitigates oversmoothing at deeper layers, PPR-based architectures still achieve their best performance at a shallow depth and are outperformed by the graph convolution approach on certain graphs. Finally, we support our theoretical results with numerical experiments, which further suggest that the oversmoothing phenomenon observed in practice can be magnified by the difficulty of optimizing deep GNN models.
翻译:过度平滑是构建更强大图神经网络(GNNs)的核心挑战。以往研究仅证明了当图卷积次数趋于无穷时,过度平滑不可避免。本文通过非渐近分析精确刻画了该现象背后的机制。具体而言,我们区分了应用图卷积时的两种不同效应——一种是将不同类别节点表征均质化的不良混合效应,另一种是将同类节点表征均质化的理想去噪效应。通过对基于上下文随机块模型(CSBM)采样的随机图量化这两种效应,我们证明当混合效应开始主导去噪效应时便会发生过度平滑,且对于具有$N$个节点的足够稠密图,发生该转变所需的层数为$O(\log N/\log (\log N))$。我们还将分析扩展至研究个性化PageRank(PPR)的影响,即初始残差连接对过度平滑的作用。结果表明,虽然PPR能缓解深层网络中的过度平滑,但基于PPR的架构仍在浅层达到最佳性能,并且在某些图中其表现不如图卷积方法。最后,我们通过数值实验支撑理论结果,这些实验进一步表明实践中观察到的过度平滑现象可能因深度GNN模型的优化困难而被放大。