Data center cooling systems consume significant auxiliary energy, yet optimization studies rarely quantify the gap between theoretically optimal and operationally deployable control strategies. This paper develops a digital twin of the liquid cooling infrastructure at the Frontier exascale supercomputer, in which a hot-temperature water system comprises three parallel subloops, each serving dedicated coolant distribution unit clusters through plate heat exchangers and variable-speed pumps. The surrogate model is built based on Modelica and validated through one full calendar year of 10-minute operational data following ASHRAE Guideline 14. The model achieves a subloop coefficient of variation of the root mean square error below 2.7% and a normalized mean bias error within 2.5%. Using this validated surrogate model, a layered optimization framework evaluates three progressively constrained strategies: an analytical flow-only optimization achieves 20.4% total energy saving, unconstrained joint optimization of flow rate and supply temperature demonstrates 30.1% total energy saving, and ramp-constrained optimization of flow rate and supply temperature, enforcing actuator rate limits, can reach total energy saving of 27.8%. The analysis reveals that the baseline system operates at 2.9 times the minimum thermally safe flow rate, and the co-optimizing supply temperature with flow rate nearly doubles the savings achievable by flow reduction alone.
翻译:数据中心冷却系统消耗大量辅助能源,但现有优化研究鲜少量化理论最优控制策略与可部署运行策略之间的差距。本文针对Frontier百亿亿次超级计算机的液冷基础设施构建了数字孪生模型,其高温水系统包含三个并行子回路,每个子回路通过板式换热器和变频泵为专用冷却液分配单元集群提供服务。基于Modelica构建的代理模型依据ASHRAE指南14,采用完整日历年的10分钟运行数据进行验证。该模型的子回路均方根误差变异系数低于2.7%,归一化平均偏差误差控制在2.5%以内。利用此验证后的代理模型,分层优化框架评估了三种渐进约束策略:纯流量解析优化可实现20.4%的总节能率;流量与供水温度的联合无约束优化展示出30.1%的总节能率;而考虑执行器速率限制的流量与供水温度斜坡约束优化,可达到27.8%的总节能率。分析表明,基线系统的运行流量为热安全最小流量的2.9倍,且流量与供水温度的协同优化所能实现的节能效果,较单纯流量调节提升近一倍。