Parallel and Distributed Graph Neural Networks: An In-Depth Concurrency Analysis

Graph neural networks (GNNs) are among the most powerful tools in deep learning. They routinely solve complex problems on unstructured networks, such as node classification, graph classification, or link prediction, with high accuracy. However, both inference and training of GNNs are complex, and they uniquely combine the features of irregular graph processing with dense and regular computations. This complexity makes it very challenging to execute GNNs efficiently on modern massively parallel architectures. To alleviate this, we first design a taxonomy of parallelism in GNNs, considering data and model parallelism, and different forms of pipelining. Then, we use this taxonomy to investigate the amount of parallelism in numerous GNN models, GNN-driven machine learning tasks, software frameworks, or hardware accelerators. We use the work-depth model, and we also assess communication volume and synchronization. We specifically focus on the sparsity/density of the associated tensors, in order to understand how to effectively apply techniques such as vectorization. We also formally analyze GNN pipelining, and we generalize the established Message-Passing class of GNN models to cover arbitrary pipeline depths, facilitating future optimizations. Finally, we investigate different forms of asynchronicity, navigating the path for future asynchronous parallel GNN pipelines. The outcomes of our analysis are synthesized in a set of insights that help to maximize GNN performance, and a comprehensive list of challenges and opportunities for further research into efficient GNN computations. Our work will help to advance the design of future GNNs.

翻译：图神经网络（GNN）是深度学习中最强大的工具之一。它们通常能高精度地解决非结构化网络上的复杂问题，例如节点分类、图分类或链接预测。然而，GNN的推理和训练都十分复杂，并且独特地结合了非规则图处理与密集规则计算的特点。这种复杂性使得在现代大规模并行架构上高效执行GNN极具挑战性。为缓解这一问题，我们首先设计了一种GNN并行性的分类法，考虑了数据并行、模型并行以及不同形式的流水线。然后，我们利用该分类法研究了众多GNN模型、GNN驱动的机器学习任务、软件框架或硬件加速器中的并行度。我们采用工作-深度模型，并评估了通信量与同步开销。我们特别关注相关张量的稀疏性/密集性，以理解如何有效应用向量化等技术。我们还形式化地分析了GNN流水线，并将已建立的GNN消息传递模型类推广到任意流水线深度，以便于未来优化。最后，我们研究了不同形式的异步性，为未来异步并行GNN流水线指明了方向。我们的分析结果被综合为一组有助于最大化GNN性能的洞见，以及一份关于高效GNN计算进一步研究的挑战与机遇的全面清单。我们的工作将有助于推动未来GNN的设计。