Datacenters are the backbone of our digital society, but raise numerous operational challenges. We envision digital twins becoming primary instruments in datacenter operations, continuously and autonomously helping with major operational decisions and with adapting ICT infrastructure, live, with a human-in-the-loop. Although fields such as aviation and autonomous driving successfully employ digital twins, an open-source digital twin for datacenters has not been demonstrated to the community. Addressing this challenge, we design, implement, and experiment using OpenDT, an Open-source, Digital Twin for monitoring and operating datacenters through a continuous integration cycle that includes: (1) live and continuous telemetry data; (2) discrete-event simulation using live telemetry from the physical ICT, with self-calibration; and (3) SLO-aware and human-approved feedback to physical ICT. Through trace-driven experiments with a prototype mainly covering stages 1 and 2 of the cycle, we show that (i) OpenDT can be used to reproduce peer-reviewed experiments and extend the analysis with performance and energy-efficiency results; (ii) OpenDT's online re-calibration can increase digital-twinning accuracy, quantified to a MAPE of 4.39% vs. 7.86% in peer-reviewed work. OpenDT adheres to FAIR/FOSS principles and is available at: https://github.com/atlarge-research/opendt/tree/hcp.
翻译:数据中心是我们数字社会的基石,但也带来了诸多运营挑战。我们设想数字孪生将成为数据中心运营的主要工具,在人工参与环路的前提下,持续自主地辅助重大运营决策并实时调整ICT基础设施。尽管航空和自动驾驶等领域已成功应用数字孪生技术,但面向数据中心的开放源代码数字孪生方案尚未在业界得到验证。针对这一挑战,我们设计、实现并实验了OpenDT——一个用于数据中心监控与运营的开放源代码数字孪生系统。该系统通过持续集成周期实现三大功能:(1)实时持续遥测数据采集;(2)基于物理ICT设备实时遥测数据的自校准离散事件仿真;(3)面向服务等级协议且经人工确认的物理ICT反馈机制。通过涵盖周期第一阶段和第二阶段的原型系统跟踪驱动实验,我们证明:(i)OpenDT可复现同行评审实验,并扩展性能与能效分析结果;(ii)OpenDT的在线重校准可将数字孪生精度提升至平均绝对百分比误差4.39%(对比同类研究的7.86%)。OpenDT遵循FAIR/FOSS原则,开源地址:https://github.com/atlarge-research/opendt/tree/hcp。