The rapid growth of AI training has dramatically increased datacenter traffic demand and energy consumption, which has motivated renewed interest in optical circuit switches (OCSes) as a high-bandwidth, energy-efficient alternative for AI fabrics. Deploying multiple parallel OCSes is a leading alternative. However, efficiently scheduling time-varying traffic matrices across parallel optical switches with non-negligible reconfiguration delays remains an open challenge. We consider the problem of scheduling a single AI traffic demand matrix $D$ over $s$ parallel OCSes while minimizing the makespan under reconfiguration delay $δ$. Our algorithm Spectra relies on a three-step approach: Decompose $D$ into a minimal set of weighted permutations; Schedule these permutations across parallel switches using load-aware assignment; then Equalize the imbalanced loads on the switches via controlled permutation splitting. Evaluated on realistic AI training workloads (GPT model and Qwen MoE expert routing) as well as standard benchmarks, Spectra vastly outperforms a baseline based on state-of-the-art algorithms, reducing schedule makespan by an average factor of $1.4\times$ on GPT AI workloads, $1.9\times$ on MoE AI workloads, and $2.4\times$ on standard benchmarks. Further, the makespans achieved by Spectra consistently approach newly derived lower bounds.
翻译:AI训练的快速增长极大地增加了数据中心流量需求和能耗,这促使人们重新关注光路交换机(OCSes),将其作为AI互连架构中一种高带宽、高能效的替代方案。部署多个并行OCSes是一种主流替代方案。然而,在具有不可忽略的重配置延迟的并行光交换机上,高效调度时变流量矩阵仍然是一个开放挑战。我们考虑在重配置延迟$δ$下,调度单个AI流量需求矩阵$D$到$s$个并行OCSes上,同时最小化完工时间的问题。我们的算法Spectra采用三步法:将$D$分解为一组最小化的加权置换;使用负载感知分配将这些置换调度到并行交换机上;然后通过受控的置换分割来均衡交换机间的不平衡负载。在真实的AI训练负载(GPT模型和Qwen MoE专家路由)以及标准基准测试上的评估表明,Spectra大幅优于基于最先进算法的基线,在GPT AI负载上平均减少完工时间$1.4\times$,在MoE AI负载上减少$1.9\times$,在标准基准测试上减少$2.4\times$。此外,Spectra实现的完工时间持续逼近新推导出的下界。