Graph neural networks (GNNs) learn to represent nodes by aggregating information from their neighbors. As GNNs increase in depth, their receptive field grows exponentially, leading to high memory costs. Several existing methods address this by sampling a small subset of nodes, scaling GNNs to much larger graphs. These methods are primarily evaluated on homophilous graphs, where neighboring nodes often share the same label. However, most of these methods rely on static heuristics that may not generalize across different graphs or tasks. We argue that the sampling method should be adaptive, adjusting to the complex structural properties of each graph. To this end, we introduce GRAPES, an adaptive sampling method that learns to identify the set of nodes crucial for training a GNN. GRAPES trains a second GNN to predict node sampling probabilities by optimizing the downstream task objective. We evaluate GRAPES on various node classification benchmarks, involving homophilous as well as heterophilous graphs. We demonstrate GRAPES' effectiveness in accuracy and scalability, particularly in multi-label heterophilous graphs. Unlike other sampling methods, GRAPES maintains high accuracy even with smaller sample sizes and, therefore, can scale to massive graphs. Our code is publicly available at https://github.com/dfdazac/grapes.
翻译:图神经网络(GNNs)通过学习聚合邻居信息来学习节点表示。随着GNN深度增加,其感受野呈指数级增长,导致高昂的内存开销。现有若干方法通过采样一小部分节点来解决此问题,从而使GNN能够扩展到更大的图。这些方法主要在具有同配性的图上进行评估,即相邻节点通常共享相同标签。然而,这些方法大多依赖于静态启发式策略,可能无法在不同图或任务间泛化。我们认为采样方法应具有自适应性,能够根据每个图的复杂结构特性进行调整。为此,我们提出了GRAPES,一种自适应采样方法,通过学习识别对训练GNN至关重要的节点集合。GRAPES通过优化下游任务目标,训练第二个GNN来预测节点采样概率。我们在多种节点分类基准测试中评估GRAPES,包括同配性和异配性图。实验证明了GRAPES在准确性和可扩展性方面的有效性,尤其是在多标签异配性图中。与其他采样方法不同,GRAPES即使在较小采样规模下仍能保持高精度,因此能够扩展到大规模图。我们的代码公开于https://github.com/dfdazac/grapes。