Graph neural networks (GNNs) have achieved breakthroughs in various real-world downstream tasks due to their powerful expressiveness. As the scale of real-world graphs has been continuously growing, \textit{a storage-based approach to GNN training} has been studied, which leverages external storage (e.g., NVMe SSDs) to handle such web-scale graphs on a single machine. Although such storage-based GNN training methods have shown promising potential in large-scale GNN training, we observed that they suffer from a severe bottleneck in data preparation since they overlook a critical challenge: \textit{how to handle a large number of small storage I/Os}. To address the challenge, in this paper, we propose a novel storage-based GNN training framework, named \textsf{AGNES}, that employs a method of \textit{block-wise storage I/O processing} to fully utilize the I/O bandwidth of high-performance storage devices. Moreover, to further enhance the efficiency of each storage I/O, \textsf{AGNES} employs a simple yet effective strategy, \textit{hyperbatch-based processing} based on the characteristics of real-world graphs. Comprehensive experiments on five real-world graphs reveal that \textsf{AGNES} consistently outperforms four state-of-the-art methods, by up to 4.1$\times$ faster than the best competitor. Our code is available at https://github.com/Bigdasgit/agnes-kdd26.
翻译:图神经网络(GNNs)凭借其强大的表达能力,已在各种现实世界下游任务中取得突破性进展。随着现实世界图数据规模的持续增长,一种基于存储的GNN训练方法被提出并研究,该方法利用外部存储(如NVMe SSD)在单机上处理此类网络级规模的图数据。尽管这类基于存储的GNN训练方法在大规模GNN训练中展现出巨大潜力,但我们发现它们在数据准备阶段存在严重瓶颈,原因在于其忽视了一个关键挑战:如何处理海量的小型存储I/O。为解决这一挑战,本文提出了一种新颖的基于存储的GNN训练框架,命名为AGNES。该框架采用块级存储I/O处理方法,以充分利用高性能存储设备的I/O带宽。此外,为进一步提升每次存储I/O的效率,AGNES基于现实世界图的特性,采用了一种简单而有效的策略——基于超批量的处理。在五个真实世界图数据集上的综合实验表明,AGNES始终优于四种最先进的方法,其速度最高可比最佳竞争对手快4.1倍。我们的代码公开于https://github.com/Bigdasgit/agnes-kdd26。