Graphs model several real-world phenomena. With the growth of unstructured and semi-structured data, parallelization of graph algorithms is inevitable. Unfortunately, due to inherent irregularity of computation, memory access, and communication, graph algorithms are traditionally challenging to parallelize. To tame this challenge, several libraries, frameworks, and domain-specific languages (DSLs) have been proposed to reduce the parallel programming burden of the users, who are often domain experts. However, existing frameworks to model graph algorithms typically target a single architecture. In this paper, we present a graph DSL, named StarPlat, that allows programmers to specify graph algorithms in a high-level format, but generates code for three different backends from the same algorithmic specification. In particular, the DSL compiler generates OpenMP for multi-core, MPI for distributed, and CUDA for many-core GPUs. Since these three are completely different parallel programming paradigms, binding them together under the same language is challenging. We share our experience with the language design. Central to our compiler is an intermediate representation which allows a common representation of the high-level program, from which individual backend code generations begin. We demonstrate the expressiveness of StarPlat by specifying four graph algorithms: betweenness centrality computation, page rank computation, single-source shortest paths, and triangle counting. We illustrate the effectiveness of our approach by comparing the performance of the generated codes with that obtained with hand-crafted library codes. We find that the generated code is competitive to library-based codes in many cases. More importantly, we show the feasibility to generate efficient codes for different target architectures from the same algorithmic specification of graph algorithms.
翻译:图模型能描述多种现实世界现象。随着非结构化与半结构化数据的增长,图算法的并行化显得尤为必要。然而,由于计算、内存访问和通信固有的不规则性,图算法的并行化传统上颇具挑战。为应对这一挑战,研究者提出了多种库、框架和领域特定语言(DSL),以减轻领域专家用户在并行编程方面的负担。然而,现有的图算法建模框架通常针对单一架构。本文提出一种名为StarPlat的图DSL,允许程序员以高层级格式指定图算法,但能从同一算法规范为三种不同后端生成代码。具体而言,该DSL编译器可生成面向多核CPU的OpenMP代码、面向分布式系统的MPI代码以及面向众核GPU的CUDA代码。由于这三种并行编程范式完全不同,将它们统一在同一语言下颇具难度。我们分享了语言设计中的经验。编译器的核心是一种中间表示,它允许对高层级程序进行统一表示,各后端代码生成从该表示开始。我们通过指定四种图算法(中介中心性计算、PageRank计算、单源最短路径和三角形计数)展示了StarPlat的表达能力。通过将生成代码的性能与手工编写的库代码性能进行比较,我们说明了方法的有效性。结果表明,在许多情况下,生成代码的性能可与基于库的代码相媲美。更重要的是,我们证明了从同一图算法规范为不同目标架构生成高效代码的可行性。