Parallel and Distributed Graph Neural Networks: An In-Depth Concurrency Analysis

Graph neural networks (GNNs) are among the most powerful tools in deep learning. They routinely solve complex problems on unstructured networks, such as node classification, graph classification, or link prediction, with high accuracy. However, both inference and training of GNNs are complex, and they uniquely combine the features of irregular graph processing with dense and regular computations. This complexity makes it very challenging to execute GNNs efficiently on modern massively parallel architectures. To alleviate this, we first design a taxonomy of parallelism in GNNs, considering data and model parallelism, and different forms of pipelining. Then, we use this taxonomy to investigate the amount of parallelism in numerous GNN models, GNN-driven machine learning tasks, software frameworks, or hardware accelerators. We use the work-depth model, and we also assess communication volume and synchronization. We specifically focus on the sparsity/density of the associated tensors, in order to understand how to effectively apply techniques such as vectorization. We also formally analyze GNN pipelining, and we generalize the established Message-Passing class of GNN models to cover arbitrary pipeline depths, facilitating future optimizations. Finally, we investigate different forms of asynchronicity, navigating the path for future asynchronous parallel GNN pipelines. The outcomes of our analysis are synthesized in a set of insights that help to maximize GNN performance, and a comprehensive list of challenges and opportunities for further research into efficient GNN computations. Our work will help to advance the design of future GNNs.

翻译：图神经网络（GNN）是深度学习中最为强大的工具之一。它们通常以高精度解决非结构化网络上的复杂问题，例如节点分类、图分类或链接预测。然而，GNN的推理与训练过程均较为复杂，且其独特地结合了不规则图处理与稠密规则计算的特点。这种复杂性使得在现代大规模并行架构上高效执行GNN极具挑战性。为解决此问题，我们首先设计了GNN并行性分类体系，综合考虑数据并行、模型并行以及不同形式的流水线技术。随后，我们利用该分类体系探究了众多GNN模型、GNN驱动的机器学习任务、软件框架或硬件加速器中的并行程度。我们采用工作深度模型，并评估了通信量与同步开销。我们特别关注相关张量的稀疏性/稠密性，以理解如何有效应用向量化等技术。同时，我们形式化分析了GNN流水线，将已有的消息传递类GNN模型推广至任意流水线深度，从而促进未来优化。最后，我们研究了不同形式的异步性，为未来异步并行GNN流水线指明了方向。分析结果被综合为一组有助于最大化GNN性能的洞见，以及一份面向高效GNN计算进一步研究的挑战与机遇清单。本研究将推动未来GNN的设计进步。