This paper optimizes the configuration of large-scale data centers toward cost-effective, reliable and sustainable cloud supply chains. The problem involves placing incoming racks of servers within a data center to maximize demand coverage given space, power and cooling restrictions. We formulate an online integer optimization model to support rack placement decisions. We propose a tractable online sampling optimization (OSO) approach to multi-stage stochastic optimization, which approximates unknown parameters with a sample path and re-optimizes decisions dynamically. We prove that OSO achieves a strong competitive ratio in canonical online resource allocation problems and sublinear regret in the online batched bin packing problem. Theoretical and computational results show it can outperform mean-based certainty-equivalent resolving heuristics. Our algorithm has been packaged into a software solution deployed across Microsoft's data centers, contributing an interactive decision-making process at the human-machine interface. Using deployment data, econometric tests suggest that adoption of the solution has a negative and statistically significant impact on power stranding, estimated at 1-3 percentage point. At the scale of cloud computing, these improvements in data center performance result in significant cost savings and environmental benefits.
翻译:本文优化大规模数据中心的配置,旨在实现经济高效、可靠且可持续的云供应链。该问题涉及在空间、电力与冷却限制条件下,将新到货的服务器机架部署于数据中心内,以最大化需求覆盖范围。我们构建了一个在线整数优化模型以支持机架布局决策。针对多阶段随机优化问题,我们提出了一种可处理的在线采样优化方法,该方法通过采样路径逼近未知参数并动态重新优化决策。我们证明,在经典在线资源分配问题中,OSO能够实现强竞争比;在在线批处理装箱问题中,其遗憾值具有次线性阶。理论与计算结果表明,该方法可超越基于均值的确定性等价重解启发式算法。我们的算法已封装为软件解决方案,部署于微软的全球数据中心,在人机界面层面提供了交互式决策流程。基于部署数据的计量经济学检验表明,该解决方案的采用对电力搁浅具有负面且统计显著的影响,估计影响幅度为1-3个百分点。在云计算规模下,这些数据中心性能的改进将带来显著的成本节约与环境效益。