A common issue in graph learning under the semi-supervised setting is referred to as gradient scarcity. That is, learning graphs by minimizing a loss on a subset of nodes causes edges between unlabelled nodes that are far from labelled ones to receive zero gradients. The phenomenon was first described when optimizing the graph and the weights of a Graph Neural Network (GCN) with a joint optimization algorithm. In this work, we give a precise mathematical characterization of this phenomenon, and prove that it also emerges in bilevel optimization, where additional dependency exists between the parameters of the problem. While for GCNs gradient scarcity occurs due to their finite receptive field, we show that it also occurs with the Laplacian regularization model, in the sense that gradients amplitude decreases exponentially with distance to labelled nodes. To alleviate this issue, we study several solutions: we propose to resort to latent graph learning using a Graph-to-Graph model (G2G), graph regularization to impose a prior structure on the graph, or optimizing on a larger graph than the original one with a reduced diameter. Our experiments on synthetic and real datasets validate our analysis and prove the efficiency of the proposed solutions.
翻译:在半监督学习场景下的图学习中,一个常见问题被称为梯度稀缺现象:当通过最小化节点子集上的损失函数来学习图结构时,远离标记节点的未标记节点之间的边将获得零梯度。该现象最初在采用联合优化算法同时优化图结构及图神经网络(GCN)权重时被发现。本文对该现象给出了精确的数学刻画,并证明该现象同样出现在双层优化场景中——此时问题参数间存在额外依赖关系。对于GCN而言,梯度稀缺源于其有限感受野;我们证明拉普拉斯正则化模型同样存在该现象,表现为梯度幅度随与标记节点距离呈指数衰减。针对此问题,我们研究了多种解决方案:提出采用图到图(G2G)模型进行潜在图学习、通过图正则化对图结构施加先验约束,或在具有更小直径的扩展图上进行优化。在合成数据集与真实数据集上的实验验证了我们的理论分析,并证明了所提方案的有效性。