As neural networks (NN) are deployed across diverse sectors, their energy demand correspondingly grows. While several prior works have focused on reducing energy consumption during training, the continuous operation of ML-powered systems leads to significant energy use during inference. This paper investigates how the configuration of on-device hardware-elements such as GPU, memory, and CPU frequency, often neglected in prior studies, affects energy consumption for NN inference with regular fine-tuning. We propose PolyThrottle, a solution that optimizes configurations across individual hardware components using Constrained Bayesian Optimization in an energy-conserving manner. Our empirical evaluation uncovers novel facets of the energy-performance equilibrium showing that we can save up to 36 percent of energy for popular models. We also validate that PolyThrottle can quickly converge towards near-optimal settings while satisfying application constraints.
翻译:随着神经网络(NN)在多个领域的部署,其能源需求也随之增长。虽然此前多项研究聚焦于降低训练过程中的能耗,但基于机器学习系统的持续运行导致推理阶段产生大量能源消耗。本文探究了设备端硬件组件(如GPU、内存及CPU频率)的配置——这一此前研究中常被忽视的因素——如何影响采用常规微调的神经网络推理的能耗。我们提出PolyThrottle解决方案,该方案通过约束贝叶斯优化以节能方式优化各硬件组件的配置。实证评估揭示了能效平衡的新维度:针对主流模型可实现高达36%的能耗节省。同时验证了PolyThrottle能在满足应用约束的前提下快速收敛至近似最优配置。