The escalating computational demands and energy footprint of GPU-accelerated computing systems complicate informed design and operational decisions. We present the first release of Wattlytics (https://wattlytics.netlify.app), an interactive, browser-based decision-support system. Unlike existing procurement-oriented calculators, Wattlytics uniquely integrates benchmark-driven GPU performance scaling, dynamic voltage and frequency scaling (DVFS)-aware piecewise power modeling, and multi-year total cost of ownership (TCO) analysis within a single interactive environment. Users can configure heterogeneous systems across contemporary GPU architectures (GH200, H100, L40S, L40, A40, A100, and L4), select representative scientific workloads (e.g., GROMACS, AMBER), and explore deployment scenarios under constraints such as energy prices, system lifetime, and frequency scaling. Wattlytics computes multidimensional decision metrics (TCO breakdown, work-per-TCO, power-per-TCO, and work-per-watt-per-TCO) and supports design-space exploration, what-if scenarios, sensitivity metrics (elasticity, Sobol indices, Monte Carlo) and collaborative features to guide realistic cluster design and procurement under uncertainty. We demonstrate selected scenarios comparing deployment strategies under different operational modes: ixed budget, fixed GPU count, fixed performance, and fixed power. Our case studies show that, under budget or energy constraints, optimally deployed energy-efficient GPUs can outperform higher-performance alternatives in overall cost-effectiveness. Wattlytics helps users explore the design parameter space and distinguish between cost- and risk-driving factors, turning HPC design into a well-informed and explainable decision-making process.
翻译:GPU加速计算系统日益增长的算力需求与能耗足迹,使知情设计与运维决策变得复杂。我们发布首个交互式浏览器决策支持系统Wattlytics(https://wattlytics.netlify.app)。与现有面向采购的计算器不同,Wattlytics创新性地在单一交互环境中整合了基准驱动的GPU性能缩放、动态电压频率缩放(DVFS)感知的分段功耗建模,以及多年总拥有成本(TCO)分析。用户可配置跨越当代GPU架构(GH200、H100、L40S、L40、A40、A100和L4)的异构系统,选取代表性科学工作负载(如GROMACS、AMBER),并在能源价格、系统生命周期与频率缩放等约束下探索部署场景。Wattlytics计算多维决策指标(TCO分解、单位TCO工作量、单位TCO功耗、单位TCO单位功耗工作量),并支持设计空间探索、假设分析、敏感性指标(弹性系数、Sobol指数、蒙特卡洛)及协作功能,以指导不确定条件下的实用集群设计与采购。我们通过对比固定预算、固定GPU数量、固定性能与固定功耗四种运行模式下的部署策略,演示了选定场景。案例研究表明,在预算或能源约束下,优化部署的高能效GPU可在整体成本效益上超越更高性能的替代方案。Wattlytics帮助用户探索设计参数空间、区分成本驱动因子与风险驱动因子,将HPC设计转化为知情且可解释的决策过程。