Modern computing systems increasingly rely on composing heterogeneous devices to improve performance and efficiency. Programming these systems is often unproductive: algorithm implementations must be coupled to system-specific logic, including device-specific optimizations, partitioning, and inter-device communication and synchronization, which requires developing different programs for different system configurations. We propose the Juno language, which represents general purpose applications in an imperative form that can be transformed into parallel, optimized, system-specific code using an expressive and granular imperative scheduling language. We also introduce the Hercules compiler, which uses a novel intermediate representation to represent general and device-specific parallel code in a manner that is easy to analyze and manipulate using schedules. Our system achieves competitive performance with hand-optimized device-specific code (geomean speedups of $1.25\times$ and $1.48\times$ on the CPU and GPU) and significantly outperforms a prior general purpose heterogeneous programming system (geomean speedups of $9.31\times$ and $16.18\times$ on the CPU and GPU).
翻译:现代计算系统日益依赖异构设备的组合来提升性能与效率。然而,为此类系统编程往往效率低下:算法实现必须与系统特定的逻辑(包括设备特定的优化、任务划分、设备间通信与同步)紧密耦合,这导致需要为不同的系统配置开发不同的程序。我们提出了Juno语言,该语言以命令式形式表示通用应用程序,并可通过一种表达力强且粒度精细的命令式调度语言,将其转换为并行化、优化且系统特定的代码。我们还介绍了Hercules编译器,它采用一种新颖的中间表示,以易于通过调度进行分析和操作的方式,同时表达通用代码与设备特定的并行代码。我们的系统在手调优化的设备特定代码上取得了具有竞争力的性能(在CPU和GPU上的几何平均加速比分别为$1.25\times$和$1.48\times$),并显著超越了先前的通用异构编程系统(在CPU和GPU上的几何平均加速比分别为$9.31\times$和$16.18\times$)。