Datacenters are vital to our digital society, but consume a considerable fraction of global electricity and demand is projected to increase. To improve their sustainability and performance, we envision that simulators will become primary decision-making tools. However, and unlike other fields focusing on key societal infrastructure such as waterworks and mass transit, datacenter simulators do not yet combine multiple independent models into their operation and thus suffer from issues associated with singular models, such as specialization, and lack of adaptability to operational phenomena. To address this challenge, we propose M3SA, a datacenter simulation and analysis framework that uses discrete-event simulation to predict, for each model, the impact on climate and performance under various realistic datacenter conditions, and then combines these predictions. We design an architecture for simulating multiple concurrent models (Multi-Model), a technique for integrating the results of multiple models into a Meta-Model, and a procedure for quantifying Meta-Model accuracy. Through experiments with an M3SA prototype, we show that (i) M3SA can reproduce and enhance peer-reviewed experiments; (ii) M3SA can predict operational phenomena (e.g., failures) of datacenters, running fundamentally different workload traces; (iii) M3SA enables various types of what-if and how-to analysis, such as how to configure CO2-aware migration over yearly energy-production patterns. M3SA has been integrated into the open-source simulator OpenDC and is available at: https://github.com/atlarge-research/opendc-m3sa.
翻译:数据中心对数字社会至关重要,但其消耗了全球相当比例的电力资源,且需求预计将持续增长。为提升数据中心的可持续性与性能,我们设想仿真器将成为核心决策工具。然而,与其他聚焦社会关键基础设施(如供水系统、公共交通)的领域不同,数据中心仿真器尚未将多个独立模型集成运行,因而面临单一模型固有的问题,例如功能专一化及对运行现象的适应性不足。为应对这一挑战,我们提出M3SA——一种数据中心仿真分析框架。该框架采用离散事件仿真技术,分别预测各模型在不同真实数据中心条件下对气候与性能的影响,继而整合这些预测结果。我们设计了支持多模型并发仿真的架构、将多模型结果融合为元模型的技术,以及量化元模型精度的流程。基于M3SA原型实验表明:(i)M3SA可复现并增强同行评审实验;(ii)M3SA能预测数据中心运行现象(如故障),且可处理本质差异极大的工作负载轨迹;(iii)M3SA支持多种假设分析(what-if)与优化分析(how-to),例如如何根据年度能源生产模式配置CO₂感知迁移策略。M3SA已集成至开源仿真器OpenDC,可通过以下地址获取:https://github.com/atlarge-research/opendc-m3sa