Datacenter capacity is growing exponentially to satisfy the increasing demand for emerging computationally-intensive applications, such as deep learning. This trend has led to concerns over datacenters' increasing energy consumption and carbon footprint. The basic prerequisite for optimizing a datacenter's energy- and carbon-efficiency is accurately monitoring and attributing energy consumption to specific users and applications. Since datacenter servers tend to be multi-tenant, i.e., they host many applications, server- and rack-level power monitoring alone does not provide insight into their resident applications' energy usage and carbon emissions. At the same time, current application-level energy monitoring and attribution techniques are intrusive: they require privileged access to servers and require coordinated support in hardware and software, which is not always possible in cloud. To address the problem, we design WattScope, a system for non-intrusively estimating the power consumption of individual applications using external measurements of a server's aggregate power usage without requiring direct access to the server's operating system or applications. Our key insight is that, based on an analysis of production traces, the power characteristics of datacenter workloads, e.g., low variability, low magnitude, and high periodicity, are highly amenable to disaggregation of a server's total power consumption into application-specific values. WattScope adapts and extends a machine learning-based technique for disaggregating building power and applies it to server- and rack-level power meter measurements in data centers. We evaluate WattScope's accuracy on a production workload and show that it yields high accuracy, e.g., often <10% normalized mean absolute error, and is thus a potentially useful tool for datacenters in externally monitoring application-level power usage.
翻译:数据中心容量正呈指数级增长,以满足新兴计算密集型应用(如深度学习)日益增长的需求。这一趋势引发了人们对数据中心能耗及碳足迹持续攀升的担忧。优化数据中心能源效率与碳排放效率的基本前提,是准确监测并将能耗归属到具体用户及应用。由于数据中心服务器通常采用多租户架构(即同一服务器承载多个应用),仅依靠服务器级和机架级功率监测无法洞悉各驻留应用的能耗与碳排放。与此同时,当前应用级能耗监测与归属技术具有侵入性:它们需特权访问服务器,并要求软硬件协同支持,这在云环境中往往难以实现。为解决该问题,我们设计了WattScope——一种通过外部测量服务器总功率使用量来非侵入性估计单个应用功耗的系统,无需直接访问服务器操作系统或应用。我们的核心洞察是:基于生产负载轨迹的分析,数据中心工作负载的功耗特征(如低波动性、低量级与高周期性)高度适合将服务器总功耗分解为应用级特定值。WattScope适配并扩展了基于机器学习的建筑功耗分解技术,将其应用于数据中心服务器级与机架级电表测量。我们使用生产负载评估WattScope的精度,结果表明其具有高准确性(例如归一化平均绝对误差通常低于10%),因此可作为数据中心外部监测应用级功耗的潜在实用工具。