The low-level GPU programming models (CUDA, HIP, OpenCL, etc.) provide detailed control of the data flow and execution plan of a program in order to extract close-to-metal performance. However, these have a steep learning curve due to the intricacies of their syntax and semantics. This reduces programmer productivity. On the other hand, high-level models (OpenMP, OpenACC, etc.) that serve as abstractions over the low-level models are aimed at improving programmer productivity but achieving performance on-par with the low-level models is a challenge. There are inherent trade-offs between productivity, portability and performance in both approaches and there is no one-size-fits-all solution which achieves all three simultaneously. However, we believe there is room to improve programmer productivity without sacrificing performance and portability by reusing optimization patterns specific to a given domain. To this end, we propose nomp: a framework for building domain specific compilers. nomp consists of a pragma based programming model and a runtime capable of code transformation and generation based on user provided metadata.
翻译:低层次的GPU编程模型(CUDA、HIP、OpenCL等)能够对程序的数据流和执行计划进行精细控制,以获取接近硬件的性能。然而,由于其语法和语义的复杂性,这些模型的学习曲线陡峭,降低了程序员的生产力。另一方面,作为低层次模型的抽象,高层次模型(OpenMP、OpenACC等)旨在提升程序员的生产力,但要达到与低层次模型同等的性能仍面临挑战。这两种方法在生产效率、可移植性和性能之间存在着固有的权衡,目前没有能够同时兼顾三者的通用解决方案。但我们相信,通过复用特定领域的优化模式,可以在不牺牲性能和可移植性的前提下提升程序员的生产力。为此,我们提出nomp:一个构建领域特定编译器的框架。nomp包含一个基于编译指示的编程模型,以及一个能够根据用户提供的元数据进行代码转换与生成的运行时系统。