As rapidly growing AI computational demands accelerate the need for new hardware installation and maintenance, this work explores optimal data center resource management by balancing operational efficiency with fault tolerance through strategic rack positioning considering diverse resources and locations. Traditional mixed-integer programming (MIP) approaches often struggle with scalability, while heuristic methods may result in significant sub-optimality. To address these issues, this paper presents a novel two-tier optimization framework using a high-level deep reinforcement learning (DRL) model to guide a low-level gradient-based heuristic for local search. The high-level DRL agent employs Leader Reward for optimal rack type ordering, and the low-level heuristic efficiently maps racks to positions, minimizing movement counts and ensuring fault-tolerant resource distribution. This approach allows scalability to over 100,000 positions and 100 rack types. Our method outperformed the gradient-based heuristic by 7\% on average and the MIP solver by over 30\% in objective value. It achieved a 100\% success rate versus MIP's 97.5\% (within a 20-minute limit), completing in just 2 minutes compared to MIP's 1630 minutes (i.e., almost 4 orders of magnitude improvement). Unlike the MIP solver, which showed performance variability under time constraints and high penalties, our algorithm consistently delivered stable, efficient results - an essential feature for large-scale data center management.
翻译:随着人工智能计算需求的快速增长加速了新硬件安装和维护的需求,本研究通过考虑多样化资源和位置的战略性机架布局,在平衡运行效率与容错能力的基础上探索数据中心资源的最优管理。传统的混合整数规划方法常受限于可扩展性问题,而启发式方法则可能导致显著的次优解。为解决这些问题,本文提出一种新颖的双层优化框架,采用高层深度强化学习模型指导基于梯度的低层启发式局部搜索。高层DRL智能体采用领导者奖励机制实现最优机架类型排序,低层启发式算法则高效地将机架映射到物理位置,最小化移动次数并确保容错性资源分布。该方法可扩展至超过10万个机位和100种机架类型。在目标函数值上,我们的方法平均优于梯度启发式算法7%,超过MIP求解器30%以上。在20分钟时限内达到100%成功率(MIP为97.5%),仅需2分钟即可完成(MIP需1630分钟,提升近4个数量级)。与在时间约束和高惩罚条件下表现不稳定的MIP求解器不同,我们的算法始终提供稳定高效的结果——这是大规模数据中心管理的关键特性。