Graph neural networks (GNNs) learn the representation of nodes in a graph by aggregating the neighborhood information in various ways. As these networks grow in depth, their receptive field grows exponentially due to the increase in neighborhood sizes, resulting in high memory costs. Graph sampling solves memory issues in GNNs by sampling a small ratio of the nodes in the graph. This way, GNNs can scale to much larger graphs. Most sampling methods focus on fixed sampling heuristics, which may not generalize to different structures or tasks. We introduce GRAPES, an adaptive graph sampling method that learns to identify sets of influential nodes for training a GNN classifier. GRAPES uses a GFlowNet to learn node sampling probabilities given the classification objectives. We evaluate GRAPES across several small- and large-scale graph benchmarks and demonstrate its effectiveness in accuracy and scalability. In contrast to existing sampling methods, GRAPES maintains high accuracy even with small sample sizes and, therefore, can scale to very large graphs. Our code is publicly available at https://github.com/dfdazac/grapes.
翻译:摘要:图神经网络通过多种方式聚合邻域信息来学习图中节点的表示。随着网络深度增加,其感受野因邻域规模扩大而呈指数级增长,导致内存开销过高。图采样通过仅对图中少量节点进行采样,解决了图神经网络的内存问题。通过这种方式,图神经网络可扩展至更庞大的图结构。现有采样方法多采用固定采样策略,难以泛化至不同图结构或任务。本文提出GRAPES——一种自适应图采样方法,通过学习识别对训练图神经网络分类器具有影响力的节点集合。GRAPES利用GFlowNet根据分类目标学习节点采样概率。我们在多个小规模和大规模图基准数据集上评估了GRAPES,验证了其准确性与可扩展性的优势。与现有采样方法相比,GRAPES在保持较高准确率的同时仅需少量样本,因而可扩展至超大规模图。我们的代码已开源至https://github.com/dfdazac/grapes。