Automatic code optimization is a complex process that typically involves the application of multiple discrete algorithms that modify the program structure irreversibly. However, the design of these algorithms is often monolithic, and they require repetitive implementation to perform similar analyses due to the lack of cooperation. To address this issue, modern optimization techniques, such as equality saturation, allow for exhaustive term rewriting at various levels of inputs, thereby simplifying compiler design. In this paper, we propose equality saturation to optimize sequential codes utilized in directive-based programming for GPUs. Our approach simultaneously realizes less computation, less memory access, and high memory throughput. Our fully-automated framework constructs single-assignment forms from inputs to be entirely rewritten while keeping dependencies and extracts optimal cases. Through practical benchmarks, we demonstrate a significant performance improvement on several compilers. Furthermore, we highlight the advantages of computational reordering and emphasize the significance of memory-access order for modern GPUs.
翻译:自动代码优化是一个复杂过程,通常涉及应用多个离散算法对程序结构进行不可逆的修改。然而,这些算法的设计往往是单一化的,并且由于缺乏协作,它们需要重复实现才能进行类似的分析。为解决这一问题,现代优化技术(如等式饱和度)允许在输入的多个层次上进行彻底的重写,从而简化编译器设计。在本文中,我们提出利用等式饱和度来优化基于指令的GPU编程中使用的顺序代码。我们的方法同时实现了更少的计算、更少的内存访问以及高内存吞吐量。我们的全自动化框架从输入中构建单赋值形式,以便在保持依赖关系的同时完全重写,并提取最优情况。通过实际基准测试,我们展示了在多个编译器上的显著性能提升。此外,我们强调了计算重排序的优势,并指出了内存访问顺序对现代GPU的重要性。