Training graph neural networks on large datasets has long been a challenge. Traditional approaches include efficiently representing the whole graph in-memory, designing parameter efficient and sampling-based models, and graph partitioning in a distributed setup. Separately, graph databases with native graph storage and query engines have been developed, which enable time and resource efficient graph analytics workloads. We show how to directly train a GNN on a graph DB, by retrieving minimal data into memory and sampling using the query engine. Our experiments show resource advantages for single-machine and distributed training. Our approach opens up a new way of scaling GNNs as well as a new application area for graph DBs.
翻译:在大规模数据集上训练图神经网络长期以来一直是一个挑战。传统方法包括在内存中高效表示整个图、设计参数高效且基于采样的模型,以及在分布式设置中进行图划分。与此同时,具有原生图存储和查询引擎的图数据库已经得到发展,它们能够实现时间和资源高效的图分析工作负载。我们展示了如何直接在图数据库上训练图神经网络,其方法是通过查询引擎将最少量的数据检索到内存中并进行采样。我们的实验展示了在单机和分布式训练中的资源优势。我们的方法为图神经网络的扩展以及图数据库的新应用领域开辟了一条新途径。