The increasing energy demands and carbon footprint of large-scale AI require intelligent workload management in globally distributed data centers. Yet progress is limited by the absence of benchmarks that realistically capture the interplay of time-varying environmental factors (grid carbon intensity, electricity prices, weather), detailed data center physics (CPUs, GPUs, memory, HVAC energy), and geo-distributed network dynamics (latency and transmission costs). To bridge this gap, we present DCcluster-Opt: an open-source, high-fidelity simulation benchmark for sustainable, geo-temporal task scheduling. DCcluster-Opt combines curated real-world datasets, including AI workload traces, grid carbon intensity, electricity markets, weather across 20 global regions, cloud transmission costs, and empirical network delay parameters with physics-informed models of data center operations, enabling rigorous and reproducible research in sustainable computing. It presents a challenging scheduling problem where a top-level coordinating agent must dynamically reassign or defer tasks that arrive with resource and service-level agreement requirements across a configurable cluster of data centers to optimize multiple objectives. The environment also models advanced components such as heat recovery. A modular reward system enables an explicit study of trade-offs among carbon emissions, energy costs, service level agreements, and water use. It provides a Gymnasium API with baseline controllers, including reinforcement learning and rule-based strategies, to support reproducible ML research and a fair comparison of diverse algorithms. By offering a realistic, configurable, and accessible testbed, DCcluster-Opt accelerates the development and validation of next-generation sustainable computing solutions for geo-distributed data centers.
翻译:大规模人工智能日益增长的能源需求与碳足迹,亟需在全球分布式数据中心实施智能工作负载管理。然而,由于缺乏能够真实反映时变环境因素(电网碳强度、电价、天气)、详细数据中心物理特性(CPU、GPU、内存、暖通空调能耗)以及地理分布式网络动态(延迟与传输成本)之间相互作用的基准测试,相关进展受到限制。为弥补这一空白,我们提出了DCcluster-Opt:一个用于可持续、地理时空任务调度的开源高保真仿真基准。DCcluster-Opt整合了精选的真实世界数据集(包括AI工作负载轨迹、电网碳强度、电力市场、全球20个区域的天气、云传输成本及经验性网络延迟参数)与基于物理原理的数据中心运行模型,为可持续计算领域提供了严谨且可复现的研究平台。它提出了一个具有挑战性的调度问题:一个顶层协调代理必须动态地重新分配或延迟到达的任务,这些任务带有资源和服务水平协议要求,需在一个可配置的数据中心集群中进行处理,以优化多个目标。该环境还模拟了如热回收等先进组件。模块化的奖励系统支持对碳排放、能源成本、服务水平协议及用水量之间权衡关系的明确研究。它提供了Gymnasium API及基线控制器(包括强化学习与基于规则的策略),以支持可复现的机器学习研究及多种算法的公平比较。通过提供一个真实、可配置且易于使用的测试平台,DCcluster-Opt加速了面向地理分布式数据中心的下一代可持续计算解决方案的开发与验证。