With the rapid advancement of technology, parallel computing applications have become increasingly popular and are commonly executed in large data centers. These applications involve two phases: computation and communication, which are executed repeatedly to complete the work. However, due to the ever-increasing demand for computing power, large data centers are struggling to meet the massive communication demands. To address this problem, coflow has been proposed as a networking abstraction that captures communication patterns in data-parallel computing frameworks. This paper focuses on the coflow scheduling problem in identical parallel networks, where the primary objective is to minimize the makespan, which is the maximum completion time of coflows. It is considered one of the most significant $\mathcal{NP}$-hard problems in large data centers. In this paper, we consider two problems: flow-level scheduling and coflow-level scheduling. In the flow-level scheduling problem, distinct flows can be transferred through different network cores, whereas in the coflow-level scheduling problem, all flows must be transferred through the same network core. To address the flow-level scheduling problem, this paper proposes two algorithms: a $(3-\tfrac{2}{m})$-approximation algorithm and a $(\tfrac{8}{3}-\tfrac{2}{3m})$-approximation algorithm, where $m$ represents the number of network cores. For the coflow-level scheduling problem, this paper proposes a $(2m)$-approximation algorithm. Finally, we conduct simulations on our proposed algorithm and Weaver's algorithm, as presented in Huang \textit{et al.} (2020) in the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS). We also validate the effectiveness of the proposed algorithms on heterogeneous parallel networks.
翻译:随着技术的飞速发展,并行计算应用日益普及,并通常在大规模数据中心中执行。这类应用包含计算与通信两个阶段,二者反复交替执行以完成工作。然而,由于对算力的需求持续增长,大规模数据中心难以满足海量的通信需求。为解决此问题,研究者提出了协同流(coflow)这一网络抽象概念,用以捕获数据并行计算框架中的通信模式。本文聚焦于相同并行网络中的协同流调度问题,其主要目标是最小化完工时间(makespan),即所有协同流完成时间的最大值。该问题被认为是大规模数据中心中最重要的$\mathcal{NP}$-难题之一。本文考虑两类问题:流级调度与协同流级调度。在流级调度问题中,不同流可通过不同的网络核心进行传输;而在协同流级调度问题中,所有流必须通过同一网络核心传输。针对流级调度问题,本文提出了两种算法:$(3-\tfrac{2}{m})$近似算法与$(\tfrac{8}{3}-\tfrac{2}{3m})$近似算法,其中$m$表示网络核心的数量。针对协同流级调度问题,本文提出了一种$(2m)$近似算法。最后,我们对本文提出的算法与Huang等人(2020)在2020年IEEE国际并行与分布式处理研讨会(IPDPS)中提出的Weaver算法进行了仿真实验。同时,我们还验证了所提算法在异构并行网络中的有效性。