As neural networks (NN) are deployed across diverse sectors, their energy demand correspondingly grows. While several prior works have focused on reducing energy consumption during training, the continuous operation of ML-powered systems leads to significant energy use during inference. This paper investigates how the configuration of on-device hardware-elements such as GPU, memory, and CPU frequency, often neglected in prior studies, affects energy consumption for NN inference with regular fine-tuning. We propose PolyThrottle, a solution that optimizes configurations across individual hardware components using Constrained Bayesian Optimization in an energy-conserving manner. Our empirical evaluation uncovers novel facets of the energy-performance equilibrium showing that we can save up to 36 percent of energy for popular models. We also validate that PolyThrottle can quickly converge towards near-optimal settings while satisfying application constraints.
翻译:随着神经网络(NN)被部署到各行各业,其能源需求也相应增长。尽管此前已有诸多工作聚焦于降低训练过程中的能耗,但机器学习驱动系统的持续运行会导致推理阶段产生大量能源消耗。本文研究了设备端硬件组件(如GPU、内存和CPU频率)的配置——这些因素在先前研究中常被忽视——如何影响结合常规微调的神经网络推理能耗。我们提出PolyThrottle解决方案,该方案利用约束贝叶斯优化,以节能方式优化各硬件组件的配置。实证评估揭示了能效平衡的全新维度,表明针对主流模型可节省高达36%的能耗。同时验证了PolyThrottle在满足应用约束的前提下,能快速收敛至近似最优配置。