Deep Neural Networks (DNNs) have revolutionized various fields, but their deployment on GPUs often leads to significant energy consumption. Unlike existing methods for reducing GPU energy consumption, which are either hardware-inflexible or limited by workload constraints, this paper addresses the problem at the GPU kernel level. We propose a novel search-based compilation method to generate energy-efficient GPU kernels by incorporating energy efficiency into the search process. To accelerate the energy evaluation process, we develop an accurate energy cost model based on high-level kernel features. Furthermore, we introduce a dynamic updating strategy for the energy cost model, reducing the need for on-device energy measurements and accelerating the search process. Our evaluation demonstrates that the proposed approach can generate GPU kernels with up to 21.69% reduced energy consumption while maintaining low latency.
翻译:深度神经网络(DNNs)已在多个领域引发革命性变革,但其在GPU上的部署往往导致显著的能耗。与现有降低GPU能耗的方法(这些方法要么硬件灵活性不足,要么受限于工作负载约束)不同,本文在GPU内核层面解决该问题。我们提出了一种新颖的基于搜索的编译方法,通过将能效纳入搜索过程来生成节能的GPU内核。为了加速能耗评估过程,我们基于高层内核特征开发了一个精确的能耗成本模型。此外,我们引入了能耗成本模型的动态更新策略,减少了对设备端能耗测量的需求,并加速了搜索过程。我们的评估表明,所提出的方法能够生成能耗降低高达21.69%的GPU内核,同时保持低延迟。