Sparse graphs are ubiquitous in real and virtual worlds. With the phenomenal growth in semi-structured and unstructured data, sizes of the underlying graphs have witnessed a rapid growth over the years. Analyzing such large structures necessitates parallel processing, which is challenged by the intrinsic irregularity of sparse computation, memory access, and communication. It would be ideal if programmers and domain-experts get to focus only on the sequential computation and a compiler takes care of auto-generating the parallel code. On the other side, there is a variety in the number of target hardware devices, and achieving optimal performance often demands coding in specific languages or frameworks. Our goal in this work is to focus on a graph DSL which allows the domain-experts to write almost-sequential code, and generate parallel code for different accelerators from the same algorithmic specification. In particular, we illustrate code generation from the StarPlat graph DSL for NVIDIA, AMD, and Intel GPUs using CUDA, OpenCL, SYCL, and OpenACC programming languages. Using a suite of ten large graphs and four popular algorithms, we present the efficacy of StarPlat's versatile code generator.
翻译:稀疏图在现实世界和虚拟世界中普遍存在。随着半结构化和非结构化数据的惊人增长,底层图的规模近年来经历了快速增长。分析这样的大型结构需要并行处理,而稀疏计算、内存访问和通信的固有不规则性对此提出了挑战。理想情况下,程序员和领域专家只需专注于顺序计算,而编译器负责自动生成并行代码。另一方面,目标硬件设备种类繁多,且实现最佳性能通常需要使用特定语言或框架进行编码。本工作的目标是专注于一种图领域特定语言,使领域专家能够编写近似顺序的代码,并从相同的算法规范中为不同的加速器生成并行代码。具体而言,我们展示了从StarPlat图领域特定语言使用CUDA、OpenCL、SYCL和OpenACC编程语言为NVIDIA、AMD和Intel GPU生成代码的过程。通过使用十张大型图和四种流行算法的测试套件,我们展示了StarPlat多功能代码生成器的有效性。