A Data-Driven Approach to Lightweight DVFS-Aware Counter-Based Power Modeling for Heterogeneous Platforms

Computing systems have shifted towards highly parallel and heterogeneous architectures to tackle the challenges imposed by limited power budgets. These architectures must be supported by novel power management paradigms addressing the increasing design size, parallelism, and heterogeneity while ensuring high accuracy and low overhead. In this work, we propose a systematic, automated, and architecture-agnostic approach to accurate and lightweight DVFS-aware statistical power modeling of the CPU and GPU sub-systems of a heterogeneous platform, driven by the sub-systems' local performance monitoring counters (PMCs). Counter selection is guided by a generally applicable statistical method that identifies the minimal subsets of counters robustly correlating to power dissipation. Based on the selected counters, we train a set of lightweight, linear models characterizing each sub-system over a range of frequencies. Such models compose a lookup-table-based system-level model that efficiently captures the non-linearity of power consumption, showing desirable responsiveness and decomposability. We validate the system-level model on real hardware by measuring the total energy consumption of an NVIDIA Jetson AGX Xavier platform over a set of benchmarks. The resulting average estimation error is 1.3%, with a maximum of 3.1%. Furthermore, the model shows a maximum evaluation runtime of 500 ns, thus implying a negligible impact on system utilization and applicability to online dynamic power management (DPM).

翻译：计算系统已转向高度并行化和异构架构，以应对有限功耗预算带来的挑战。这类架构须由新型功耗管理范式支撑，需兼顾高精度与低开销的同时，应对日益增长的设计规模、并行度和异构性。本文提出一种系统化、自动化且与架构无关的方法，针对异构平台的CPU与GPU子系统，基于各子系统的本地性能监测计数器（PMC），实现精确且轻量级的DVFS感知统计功耗建模。计数器选择由通用统计方法引导，该方法可识别与功耗强相关的最小计数器子集。基于所选计数器，我们训练一组轻量级线性模型，以刻画各子系统在多个频率范围内的特性。这些模型构成基于查找表的系统级模型，能高效捕获功耗的非线性特征，展现出理想的响应性与可分解性。我们在实际硬件上通过测量NVIDIA Jetson AGX Xavier平台在一组基准测试中的总能耗来验证系统级模型。最终平均估计误差为1.3%，最大误差为3.1%。此外，模型最大评估运行时间为500纳秒，表明其对系统利用率的可忽略影响，并适用于在线动态功耗管理（DPM）。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日