Automatic code optimization is a complex process that typically involves the application of multiple discrete algorithms that modify the program structure irreversibly. However, the design of these algorithms is often monolithic, and they require repetitive implementation to perform similar analyses due to the lack of cooperation. To address this issue, modern optimization techniques, such as equality saturation, allow for exhaustive term rewriting at various levels of inputs, thereby simplifying compiler design. In this paper, we propose equality saturation to optimize sequential codes utilized in directive-based programming for GPUs. Our approach realizes less computation, less memory access, and high memory throughput simultaneously. Our fully-automated framework constructs single-assignment forms from inputs to be entirely rewritten while keeping dependencies and extracts optimal cases. Through practical benchmarks, we demonstrate a significant performance improvement on several compilers. Furthermore, we highlight the advantages of computational reordering and emphasize the significance of memory-access order for modern GPUs.
翻译:自动代码优化是一个复杂的过程,通常涉及应用多个离散算法,这些算法会不可逆地修改程序结构。然而,这些算法的设计往往是整体式的,并且由于缺乏协作,需要重复实现以执行类似的分析。为解决此问题,现代优化技术(如等式饱和)允许在输入的不同层级进行详尽的项重写,从而简化编译器设计。本文提出使用等式饱和来优化基于指令的GPU编程中使用的顺序代码。我们的方法同时实现了更少的计算、更少的内存访问和更高的内存吞吐量。我们的全自动化框架从输入构建静态单赋值形式,以便在保持依赖关系的同时进行完全重写,并提取最优情况。通过实际基准测试,我们在多个编译器上展示了显著的性能提升。此外,我们突出了计算重排序的优势,并强调了内存访问顺序对现代GPU的重要性。