Coflow provides a key application-layer abstraction for capturing communication patterns, enabling the efficient coordination of parallel data flows to reduce job completion times in distributed systems. Modern data center networks (DCNs) are employing multiple independent optical circuit switching (OCS) cores operating concurrently to meet the massive bandwidth demands of application jobs. However, existing coflow scheduling research primarily focuses on the single-core setting, with multi-core fabrics only for EPS (electrical packet switching) networks. To address this gap, this paper studies the coflow scheduling problem in multi-core OCS networks under the not-all-stop reconfiguration model in which one circuit's reconfiguration does not interrupt other circuits. The challenges stem from two aspects: (i) cross-core coupling induced by traffic assignment across heterogeneous cores; and (ii) per-core OCS scheduling constraints, namely port exclusivity and reconfiguration delay. We propose an approximation algorithm that jointly integrates cross-core flow assignment and per-core circuit scheduling to minimize the total weighted coflow completion time (CCT) and establish a provable worst-case performance guarantee. Furthermore, our algorithm framework can be directly applied to the multi-core EPS scenario with the corresponding approximation ratio under packet-switched fabrics. Trace-driven simulations using real Facebook workloads demonstrate that our algorithm effectively reduces weighted CCT and tail CCT.
翻译:协同流(Coflow)为捕获通信模式提供了关键的应用层抽象,通过高效协调并行数据流来减少分布式系统中的作业完成时间。现代数据中心网络(DCN)正采用多个独立运行的光路交换(OCS)核心并发工作,以满足应用作业的海量带宽需求。然而,现有协同流调度研究主要聚焦于单核场景,多核架构仅适用于EPS(电分组交换)网络。为填补这一空白,本文研究了非全停重配置模型(即一条电路的重新配置不会中断其他电路)下多核OCS网络中的协同流调度问题。挑战源于两个方面:(i) 异构核心间流量分配导致的跨核心耦合;(ii) 单核OCS调度约束,即端口独占性和重配置延迟。我们提出了一种近似算法,该算法联合集成了跨核心流量分配与单核电路调度,以最小化总加权协同流完成时间(CCT),并建立了可证明的最坏情况性能保证。此外,我们的算法框架可直接应用于多核EPS场景,并提供对应分组交换架构下的近似比。使用真实Facebook工作负载的轨迹驱动仿真表明,我们的算法有效降低了加权CCT和尾部CCT。