Energy demands from data centers have surged and stressed the grid in recent years. Electric grids require balancing supply and demand every second, motivating demand response (reduction) from large loads, including data centers. This can be achieved by rescheduling jobs on a physical machine. Its real-time implementation is uncertain due to fluctuating resource utilization, and rescheduling incurs quality-of-service (QoS) losses that providers are unwilling to disclose. We propose a restless multi-armed bandit (RMAB) framework, in which the grid operator requests load reductions without access to detailed job-rescheduling procedures. Using open-source virtual machine (VM) datasets, we model job arrivals and rescheduling at each data center as a restless arm in a Markov decision process (MDP) and derive Whittle-index-based policies using the learned transition function via Thompson sampling. To overcome the weakness of an increasingly long learning process due to an enlarged state space, we use a mixed strategy that includes a global upper confidence bound (UCB) and encodes trust indices to enhance robustness and accelerate learning. Results show that the proposed mixed-strategy algorithm remains robust across varying state-space sizes and consistently outperforms the pure Thompson-Whittle (TW) algorithm, especially when contextual information is noisy. It also demonstrates superior performance compared to the state-of-the-art EXP4 framework. We provided open-source code to ensure reproducibility.
翻译:近年来,数据中心能源需求激增,对电网造成压力。电网需每秒维持供需平衡,促使大数据负荷(包括数据中心)参与需求响应(削减负荷)。该目标可通过重新调度物理机上的作业实现。然而,资源利用率的波动性导致实时调度存在不确定性,且重新调度引发的服务质量损失(QoS)是运营商不愿披露的。本文提出一个非平稳多臂赌博机(RMAB)框架,允许电网运营商在不掌握详细作业重调度流程的情况下请求负荷削减。利用开源虚拟机(VM)数据集,我们将每个数据中心的作业到达与重调度过程建模为马尔可夫决策过程(MDP)中的非平稳臂,并通过汤普森采样学习状态转移函数,推导基于Whittle索引的策略。针对状态空间扩张导致学习过程漫长的缺陷,我们采用混合策略:引入全局置信上界(UCB)并编码信任指数,以增强鲁棒性并加速学习。结果表明,所提出的混合策略算法在不同状态空间规模下均保持鲁棒性,且持续优于纯汤普森-Whittle(TW)算法(尤其在上下文信息含噪时)。相较于前沿的EXP4框架,该算法亦展现出更优性能。我们已提供开源代码以确保结果可复现。