Significant computational resources are required to train Graph Neural Networks (GNNs) at a large scale, and the process is highly data-intensive. One of the most effective ways to reduce resource requirements is minibatch training coupled with graph sampling. GNNs have the unique property that items in a minibatch have overlapping data. However, the commonly implemented Independent Minibatching approach assigns each Processing Element (PE) its own minibatch to process, leading to duplicated computations and input data access across PEs. This amplifies the Neighborhood Explosion Phenomenon (NEP), which is the main bottleneck limiting scaling. To reduce the effects of NEP in the multi-PE setting, we propose a new approach called Cooperative Minibatching. Our approach capitalizes on the fact that the size of the sampled subgraph is a concave function of the batch size, leading to significant reductions in the amount of work per seed vertex as batch sizes increase. Hence, it is favorable for processors equipped with a fast interconnect to work on a large minibatch together as a single larger processor, instead of working on separate smaller minibatches, even though global batch size is identical. We also show how to take advantage of the same phenomenon in serial execution by generating dependent consecutive minibatches. Our experimental evaluations show up to 4x bandwidth savings for fetching vertex embeddings, by simply increasing this dependency without harming model convergence. Combining our proposed approaches, we achieve up to 64% speedup over Independent Minibatching on single-node multi-GPU systems.
翻译:训练大规模图神经网络需要大量计算资源,且过程高度数据密集。降低资源需求最有效的方法之一是小批量训练结合图采样。图神经网络具有独特性质:小批量中的样本存在数据重叠。然而,常用的独立小批量处理方法为每个处理单元分配独立的小批量,导致跨处理单元的计算重复和输入数据重复访问,这加剧了作为限制扩展性主要瓶颈的邻域爆炸现象。为降低多处理单元场景中邻域爆炸现象的影响,我们提出名为协作小批量处理的新方法。该方法利用采样子图规模是小批量大小的凹函数这一特性,随着批量增大,每个种子顶点的计算工作量显著减少。因此,在配备高速互连的处理器上,即使全局批量相同,协作处理单个较大批量比独立处理多个较小批量更为有利。我们进一步展示了如何在串行执行中通过生成依赖的连续小批量来利用相同现象。实验评估表明,仅通过增加这种依赖而不影响模型收敛,获取顶点嵌入的带宽节省可达4倍。结合所提方法,我们在单节点多GPU系统上实现了相比独立小批量处理最高64%的加速。