The next generation Sunway supercomputer employs the SW26010pro processor, which features a specialized on-chip heterogeneous architecture. Applications with significant hotspots can benefit from the great computation capacity improvement of Sunway many-core architectures by carefully making intensive manual many-core parallelization efforts. However, some legacy projects with large codebases, such as CESM, ROMS and WRF, contain numerous lines of code and do not have significant hotspots. The cost of manually porting such applications to the Sunway architecture is almost unaffordable. To overcome such a challenge, we have developed a toolkit named O2ATH. O2ATH forwards GNU OpenMP runtime library calls to Sunway's Athread library, which greatly simplifies the parallelization work on the Sunway architecture.O2ATH enables users to write both MPE and CPE code in a single file, and parallelization can be achieved by utilizing OpenMP directives and attributes. In practice, O2ATH has helped us to port two large projects, CESM and ROMS, to the CPEs of the next generation Sunway supercomputers via the OpenMP offload method. In the experiments, kernel speedups range from 3 to 15 times, resulting in 3 to 6 times whole application speedups.Furthermore, O2ATH requires significantly fewer code modifications compared to manually crafting CPE functions.This indicates that O2ATH can greatly enhance development efficiency when porting or optimizing large software projects on Sunway supercomputers.
翻译:下一代神威超级计算机采用了SW26010pro处理器,该处理器具有专用的片上异构架构。存在显著热点的应用程序可通过精心进行密集的手动众核并行化工作,得益于神威众核架构计算能力的巨大提升。然而,一些具有庞大代码库的遗留项目,如CESM、ROMS和WRF,包含成千上万行代码且无明显热点,手动将此类应用程序移植至神威架构的成本几乎难以承受。为应对这一挑战,我们开发了名为O2ATH的工具包。O2ATH将GNU OpenMP运行时库调用转发至神威的Athread库,极大地简化了神威架构上的并行化工作。O2ATH允许用户在单个文件中编写MPE和CPE代码,并通过使用OpenMP指令和属性实现并行化。实践中,O2ATH已帮助我们通过OpenMP卸载方法将CESM和ROMS两个大型项目移植到下一代神威超级计算机的CPE上。实验中,内核加速比达到3至15倍,整体应用加速比为3至6倍。此外,与手动编写CPE函数相比,O2ATH所需的代码修改量显著减少。这表明,在神威超级计算机上移植或优化大型软件项目时,O2ATH能极大提升开发效率。