This paper presents FLASH 1.0, a C++-based software framework for rapid parallel deployment and enhancing host code portability in heterogeneous computing. FLASH takes a novel approach in describing kernels and dynamically dispatching them in a hardware-agnostic manner. FLASH features truly hardware-agnostic frontend interfaces, which unify the compile-time control flow and enforce a portability-optimized code organization that imposes a demarcation between computational (performance-critical) and functional (non-performance-critical) codes as well as the separation of hardware-specific and hardware-agnostic codes in the host application. We use static code analysis to measure the hardware independence ratio of twelve popular HPC applications and show that up to 99.72% code portability can be achieved with FLASH. Similarly, we measure and compare the complexity of state-of-the-art portable programming models to show that FLASH can achieve a code reduction of up to 4.0x for two common HPC kernels while maintaining 100% code portability with a normalized framework overhead between 1% - 13% of the total kernel runtime. The codes are available at https://github.com/PSCLab-ASU/FLASH.
翻译:本文介绍了FLASH 1.0,一个基于C++的软件框架,旨在实现异构计算中的快速并行部署并增强主机代码的可移植性。FLASH采用了一种新颖的方法来描述内核,并以与硬件无关的方式动态调度它们。FLASH具有真正与硬件无关的前端接口,该接口统一了编译时控制流,并强制执行一种可移植性优化的代码组织方式,该方式在计算(性能关键型)与功能(非性能关键型)代码之间施加了界限,同时在主机应用程序中分离了硬件特定代码与硬件无关代码。我们使用静态代码分析来衡量十二个流行的HPC应用程序的硬件独立性比率,结果表明使用FLASH可实现高达99.72%的代码可移植性。类似地,我们测量并比较了最先进的可移植编程模型的复杂性,以表明FLASH在维持100%代码可移植性的前提下,对于两个常见的HPC内核可实现高达4.0倍的代码量缩减,其标准化框架开销占内核总运行时间的1%至13%。相关代码可在https://github.com/PSCLab-ASU/FLASH获取。