We present a strongly polynomial-time algorithm to generate bandwidth optimal allgather/reduce-scatter on any network topology, with or without switches. Our algorithm constructs pipeline schedules achieving provably the best possible bandwidth performance on a given topology. To provide a universal solution, we model the network topology as a directed graph with heterogeneous link capacities and switches directly as vertices in the graph representation. The algorithm is strongly polynomial-time with respect to the topology size. This work heavily relies on previous graph theory work on edge-disjoint spanning trees and edge splitting. While we focus on allgather, the methods in this paper can be easily extended to generate schedules for reduce, broadcast, reduce-scatter, and allreduce.
翻译:我们提出一种强多项式时间算法,可在任意网络拓扑(含或不含交换机)上生成带宽最优的全收集/归约散播。该算法构造的流水线调度方案在给定拓扑上可证明实现最优带宽性能。为提供通用解决方案,我们将网络拓扑建模为具有异构链路容量的有向图,并将交换机直接表示为图中的顶点。该算法在拓扑规模上具有强多项式时间复杂度。本研究大量借鉴了先前关于边不交生成树与边分裂的图论成果。尽管本文聚焦全收集操作,所述方法可轻松扩展至归约、广播、归约散播及全归约操作的调度生成。