Source-to-Source Transformations for GPU Code Generation

GPUs have become essential in modern high performance computing, but programming them correctly remains a significant challenge. This difficulty arises from subtle concurrency bugs that result from the explicit management of synchronization primitives and data movement across intricate hierarchies of memory and parallel threads. At the same time, the ability to control these aspects explicitly is at the core of the performance gains granted by GPUs. These challenges have motivated interest in safe GPU programming: tools and languages that can prevent or detect such bugs statically. However, existing approaches make tradeoffs in three dimensions: the strength of their guarantees, the degree of low-level control they allow, and the amount of additional effort required to achieve these safety guarantees. This thesis presents OptiGPU, a system for GPU programming with strong safety guarantees-data race freedom, deadlock freedom, and full functional correctness-that minimizes tradeoffs in the other two dimensions compared to previous approaches. OptiGPU applies proof-preserving compilation to GPU programming, allowing verification of low-level, optimized GPU programs through refinement of simple, verified CPU programs. An OptiGPU user thus avoids the substantial proof burden of directly verifying complex optimized GPU programs, instead directing this refinement with source-to-source transformations that automatically preserve proofs. OptiGPU is implemented as a set of extensions to OptiTrust, an existing framework for proof-preserving compilation on CPUs. OptiGPU models essential GPU programming features, including kernel launches, shared memory, and synchronous barriers, and produces both device and host-side code. We evaluate OptiGPU on two case studies, matrix transpose and tree-based parallel reduction, showing it can derive CUDA code matching techniques found in handwritten references.

翻译：GPU已成为现代高性能计算中不可或缺的组成部分，但对其进行正确编程仍是一项重大挑战。这一难点源于显式管理同步原语以及在错综复杂的内存层级与并行线程间移动数据时产生的微妙并发错误。同时，对这些方面进行显式控制的能力正是GPU性能提升的核心。这些挑战激发了对安全GPU编程（即能够静态预防或检测此类错误的工具与语言）的研究兴趣。然而，现有方法在三个维度上存在权衡：保证的强度、允许的低级控制程度，以及实现这些安全保证所需的额外工作量。本论文提出OptiGPU——一种具备强安全保证（无数据竞争、无死锁及完全功能正确性）的GPU编程系统，与先前方法相比，它最大程度地减少了其他两个维度的权衡。OptiGPU将保证明编译应用于GPU编程，通过精炼简单的、经过验证的CPU程序，实现对低级优化GPU程序的验证。因此，OptiGPU用户避免了直接验证复杂优化GPU程序的大量证明负担，而是通过自动保留证明的源到源变换来引导这种精炼过程。OptiGPU作为对现有CPU保证明编译框架OptiTrust的一组扩展实现。它建模了包括内核启动、共享内存和同步屏障在内的GPU核心编程特性，并生成设备端与主机端代码。我们通过矩阵转置和基于树的并行归约两个案例研究评估OptiGPU，表明其能推导出手动编写参考资料中存在的CUDA代码技术。