Energy demands from data centers have surged and stressed the grid in recent years. Electric grids require balancing supply and demand every second, motivating demand response (reduction) from large loads, including data centers. This can be achieved by rescheduling jobs on physical machines. Its real-time implementation is uncertain due to fluctuating resource utilization, and rescheduling incurs quality-of-service (QoS) losses that providers are unwilling to disclose. We propose a restless multi-arm bandit (RMAB) framework in which the grid operator requests load reductions without access to detailed job-rescheduling procedures. Using the open-source virtual machine (VM) datasets, we model job arrivals and rescheduling at each data center as a restless arm in a Markov decision process (MDP), and derive Whittle-index-based policies based on the learned transition function via Thompson sampling. To overcome the weakness of an increasingly long learning process due to an enlarged state space, we used a mixed strategy that included a global upper confidence bound (UCB) encoded with trust indices to enhance robustness and accelerate learning. Results show that the proposed mixed-strategy algorithm remains robust across varying state-space sizes and consistently outperforms the pure Thompson-Whittle (TW) algorithm, especially when contextual information is noisy. It also demonstrates superior performance compared to the state-of-the-art EXP4 framework. We provided an open-sourced code for reproducibility.
翻译:近年来,数据中心能源需求激增,给电网带来压力。电网需每秒平衡供需,这促使大型负载(包括数据中心)参与需求响应(削减负载)。通过物理机上的作业重调度可实现这一目标。然而,由于资源利用率波动,其实时实施存在不确定性,且重调度会导致服务质量(QoS)损失,而服务提供商不愿披露此类信息。我们提出一种无休止多臂赌博机(RMAB)框架,其中电网运营商在不了解具体作业重调度流程的情况下请求负载削减。利用开源虚拟机(VM)数据集,我们将每个数据中心的作业到达与重调度过程建模为马尔可夫决策过程(MDP)中的无休止臂,并通过汤普森采样基于学习到的转移函数推导基于惠特尔指数的策略。为克服状态空间扩大导致学习过程冗长的缺陷,我们采用一种混合策略,该策略包含全局置信上界(UCB)并编码信任指数以增强鲁棒性并加速学习。结果表明,所提出的混合策略算法在不同状态空间规模下均保持鲁棒性,且始终优于纯汤普森-惠特尔(TW)算法,尤其在上下文信息存在噪声时表现更佳。该算法相比当前最优的EXP4框架也展现出更优性能。我们提供了开源代码以保证结果的可复现性。