Many Graph Neural Networks (GNNs) perform poorly compared to simple heuristics on Link Prediction (LP) tasks. This is due to limitations in expressive power such as the inability to count triangles (the backbone of most LP heuristics) and because they can not distinguish automorphic nodes (those having identical structural roles). Both expressiveness issues can be alleviated by learning link (rather than node) representations and incorporating structural features such as triangle counts. Since explicit link representations are often prohibitively expensive, recent works resorted to subgraph-based methods, which have achieved state-of-the-art performance for LP, but suffer from poor efficiency due to high levels of redundancy between subgraphs. We analyze the components of subgraph GNN (SGNN) methods for link prediction. Based on our analysis, we propose a novel full-graph GNN called ELPH (Efficient Link Prediction with Hashing) that passes subgraph sketches as messages to approximate the key components of SGNNs without explicit subgraph construction. ELPH is provably more expressive than Message Passing GNNs (MPNNs). It outperforms existing SGNN models on many standard LP benchmarks while being orders of magnitude faster. However, it shares the common GNN limitation that it is only efficient when the dataset fits in GPU memory. Accordingly, we develop a highly scalable model, called BUDDY, which uses feature precomputation to circumvent this limitation without sacrificing predictive performance. Our experiments show that BUDDY also outperforms SGNNs on standard LP benchmarks while being highly scalable and faster than ELPH.
翻译:许多图神经网络(GNN)在链接预测(LP)任务上的表现不及简单启发式方法。这是由于其在表达能力上的局限性,例如无法统计三角形(大多数链接预测启发式算法的核心),也无法区分自同构节点(具有相同结构角色的节点)。这两种表达能力问题均可通过学习链接(而非节点)表示并融入三角形计数等结构特征得以缓解。由于显式链接表示通常计算代价过高,近年来研究转向基于子图的方法,该方法在链接预测中取得了最先进的性能,但因子图间高度冗余导致效率低下。我们分析了用于链接预测的子图GNN(SGNN)方法的各组成部分。基于分析,我们提出一种名为ELPH(高效哈希链接预测)的新型全图GNN,该方法通过传递子图草图作为消息来近似SGNN的关键组件,无需显式构建子图。ELPH在表达能力上严格优于消息传递GNN(MPNN)。它在多个标准链接预测基准上超越现有SGNN模型,同时运算速度提升数个数量级。然而,该方法存在常见GNN的局限——仅在数据集可适配GPU内存时具有高效性。为此,我们开发了高度可扩展的BUDDY模型,通过特征预计算规避该局限且不牺牲预测性能。实验表明,BUDDY在标准链接预测基准上同样优于SGNN,且兼具高度可扩展性与比ELPH更快的运算速度。