Automatic code optimization is a complex process that typically involves the application of multiple discrete algorithms that modify the program structure irreversibly. However, the design of these algorithms is often monolithic, and they require repetitive implementation to perform similar analyses due to the lack of cooperation. To address this issue, modern optimization techniques, such as equality saturation, allow for exhaustive term rewriting at various levels of inputs, thereby simplifying compiler design. In this paper, we propose equality saturation to optimize sequential codes utilized in directive-based programming for GPUs. Our approach simultaneously realizes less computation, less memory access, and high memory throughput. Our fully-automated framework constructs single-assignment forms from inputs to be entirely rewritten while keeping dependencies and extracts optimal cases. Through practical benchmarks, we demonstrate a significant performance improvement on several compilers. Furthermore, we highlight the advantages of computational reordering and emphasize the significance of memory-access order for modern GPUs.
翻译:自动代码优化是一个复杂的过程,通常涉及多个离散算法的应用,这些算法会不可逆地修改程序结构。然而,这些算法的设计往往是单一的,由于缺乏协作,它们需要重复实现以进行类似的分析。为了解决这个问题,现代优化技术(如等式饱和)允许在输入的不同层次上进行穷举项重写,从而简化编译器设计。本文提出将等式饱和用于优化基于指令式编程的GPU中的顺序代码。我们的方法同时实现了更少的计算、更少的内存访问和高内存吞吐量。我们的全自动框架从输入构建单赋值形式,以在保持依赖关系的同时进行完全重写,并提取最优情况。通过实际基准测试,我们在多个编译器上展示了显著的性能提升。此外,我们强调了计算重排序的优势,并指出了内存访问顺序对现代GPU的重要性。