Computing systems have shifted towards highly parallel and heterogeneous architectures to tackle the challenges imposed by limited power budgets. These architectures must be supported by novel power management paradigms addressing the increasing design size, parallelism, and heterogeneity while ensuring high accuracy and low overhead. In this work, we propose a systematic, automated, and architecture-agnostic approach to accurate and lightweight DVFS-aware statistical power modeling of the CPU and GPU sub-systems of a heterogeneous platform, driven by the sub-systems' local performance monitoring counters (PMCs). Counter selection is guided by a generally applicable statistical method that identifies the minimal subsets of counters robustly correlating to power dissipation. Based on the selected counters, we train a set of lightweight, linear models characterizing each sub-system over a range of frequencies. Such models compose a lookup-table-based system-level model that efficiently captures the non-linearity of power consumption, showing desirable responsiveness and decomposability. We validate the system-level model on real hardware by measuring the total energy consumption of an NVIDIA Jetson AGX Xavier platform over a set of benchmarks. The resulting average estimation error is 1.3%, with a maximum of 3.1%. Furthermore, the model shows a maximum evaluation runtime of 500 ns, thus implying a negligible impact on system utilization and applicability to online dynamic power management (DPM).
翻译:计算系统已转向高度并行化和异构架构,以应对有限功耗预算带来的挑战。这类架构须由新型功耗管理范式支撑,需兼顾高精度与低开销的同时,应对日益增长的设计规模、并行度和异构性。本文提出一种系统化、自动化且与架构无关的方法,针对异构平台的CPU与GPU子系统,基于各子系统的本地性能监测计数器(PMC),实现精确且轻量级的DVFS感知统计功耗建模。计数器选择由通用统计方法引导,该方法可识别与功耗强相关的最小计数器子集。基于所选计数器,我们训练一组轻量级线性模型,以刻画各子系统在多个频率范围内的特性。这些模型构成基于查找表的系统级模型,能高效捕获功耗的非线性特征,展现出理想的响应性与可分解性。我们在实际硬件上通过测量NVIDIA Jetson AGX Xavier平台在一组基准测试中的总能耗来验证系统级模型。最终平均估计误差为1.3%,最大误差为3.1%。此外,模型最大评估运行时间为500纳秒,表明其对系统利用率的可忽略影响,并适用于在线动态功耗管理(DPM)。