It is often difficult to write code that you can ensure will be executed in the right order when programing for parallel compute tasks. Due to the way that today's parallel compute hardware, primarily Graphical Processing Units (GPUs), allows you to write code. It is easy to write code that may result in one thread reading or modifying data before it should, thus resulting in a data race. It would be useful to have a tool that could verify that the code will execute as expected. However, most static analysis done at the language level has to be completely retooled to work on a different languages. Therefore, it would be of great use to be able to perform verification and analysis on the Memory Model of a parallel compute code, in a lower level intermediary representations that most languages pass through on their way to something that the GPU hardware can understand. This body of work aims to deal with the question of if there is still enough of the information in the intermediary representations to be able to perform memory model verification to check for data races. To determine this we plan to analyze as a case study the GeSpMM Sparse Matrix Multiplication Algorithm, implemented in CUDA C++ with the LLVM compiler and Julia with CUDA.jl.
翻译:在编写并行计算任务程序时,确保代码按正确顺序执行往往十分困难。这主要源于当前以图形处理器(GPU)为主的并行计算硬件的编程方式,使得开发者极易编写出导致线程过早读取或修改数据的代码,从而引发数据竞争。若能提供一种工具来验证代码是否按预期执行,将具有重要价值。然而,大多数在语言层面进行的静态分析工具需针对不同语言完全重构。因此,在更底层的中间表示——即多数语言在转换为GPU硬件可理解形式前必经的中间阶段——对并行计算代码的内存模型进行验证与分析,将具有显著实用意义。本研究旨在探讨:在中间表示中是否仍保留足够信息,以支持通过内存模型验证来检测数据竞争。为此,我们计划以GeSpMM稀疏矩阵乘法算法作为案例进行分析,该算法分别通过LLVM编译器以CUDA C++实现,以及通过CUDA.jl以Julia语言实现。