Graph Neural Networks (GNNs) are widely used for learning on graph-structured data, but scaling GNN training to massive graphs remains challenging. To enable scalable distributed training, graphs are divided into smaller partitions that are distributed across multiple machines such that inter-machine communication is minimized and computational load is balanced. In practice, existing partitioning approaches face a fundamental trade-off between partitioning overhead and partitioning quality. We propose EmbedPart, an embedding-driven partitioning approach that achieves both speed and quality. Instead of operating directly on irregular graph structures, EmbedPart leverages node embeddings produced during the actual GNN training workload and clusters these dense embeddings to derive a partitioning. EmbedPart achieves more than 100x speedup over Metis while maintaining competitive partitioning quality and accelerating distributed GNN training. Moreover, EmbedPart naturally supports graph updates and fast repartitioning, and can be applied to graph reordering to improve data locality and accelerate single-machine GNN training. By shifting partitioning from irregular graph structures to dense embeddings, EmbedPart enables scalable and high-quality graph data optimization.
翻译:摘要:图神经网络(GNN)被广泛用于图结构数据的学习,但将GNN训练扩展到大规模图仍具有挑战性。为实现可扩展的分布式训练,需将图划分为更小的分区并分布到多台机器上,以最小化机器间通信并平衡计算负载。实际上,现有划分方法在划分开销与划分质量之间存在根本性权衡。我们提出EmbedPart,一种既保证速度又兼顾质量的嵌入驱动划分方法。与直接处理不规则图结构不同,EmbedPart利用实际GNN训练过程中生成的节点嵌入,并对这些稠密嵌入进行聚类以推导划分方案。EmbedPart在保持竞争性划分质量并加速分布式GNN训练的同时,相比Metis实现了超过100倍的加速比。此外,EmbedPart天然支持图更新与快速重划分,并可应用于图重排序以改善数据局部性,从而加速单机GNN训练。通过将划分从处理不规则图结构转变为处理稠密嵌入,EmbedPart实现了可扩展的高质量图数据优化。