With the advent of the Exascale capability allowing supercomputers to perform at least $10^{18}$ IEEE 754 Double Precision (64 bits) operations per second, many concerns have been raised regarding the energy consumption of high-performance computing code. Recently, Frontier operated by the Oak Ridge National Laboratory, has become the first supercomputer to break the exascale barrier. In total, contains 9,408 CPUs, 37,632 GPUs, and 8,730,112 cores. This world-leading supercomputer consumes about 21 megawatts which is truly remarkable as was also ranked first on the Green500 list before being recently replaced. The previous top Green500 machine, MN-3 in Japan, provided 39.38 gigaflops per watt, while the delivered 62.68 gigaflops per watt. All these infrastructure and hardware improvements are just the tip of the Iceberg. Energy-aware code is now required to minimize the energy consumption of distributed and/or multi-threaded software. For example, the data movement bottleneck is responsible for $35-60\%$ of a system's energy consumption during intra-node communication. In an HPC environment, additional energy is consumed through inter-node communication. This position paper aims to introduce future research directions to enter now in the age of energy-aware software. The paper is organized as follows. First, we introduce related works regarding measurement and energy optimization. Then we propose to focus on the two different levels of granularity in energy optimization.
翻译:随着百亿亿次计算能力的到来,使超级计算机每秒至少能执行$10^{18}$次IEEE 754双精度(64位)运算,高性能计算代码的能耗问题日益受到关注。近期,由橡树岭国家实验室运行的Frontier成为首台突破百亿亿次计算壁垒的超级计算机。该系统共包含9,408个CPU、37,632个GPU和8,730,112个核心。这台全球领先的超级计算机功耗约为21兆瓦,这一表现极为出色,并曾在Green500榜单中位列首位(近期已被超越)。此前Green500榜单的榜首——日本的MN-3——可实现每瓦39.38吉次浮点运算,而Frontier则达到每瓦62.68吉次浮点运算。所有这些基础设施和硬件的改进仅是冰山一角。如今,能效感知代码对于最小化分布式和/或多线程软件的能耗至关重要。例如,在节点内通信中,数据移动瓶颈占据了系统能耗的$35-60\%$。在HPC环境中,节点间通信还会额外消耗能源。本立场论文旨在提出未来研究方向,以迎接能效感知软件时代的到来。本文结构如下:首先,介绍与测量和能耗优化相关的研究工作;随后,聚焦于能耗优化中两种不同粒度的层次。