In the All-Reduce problem, each one of the K nodes holds an input and wishes to compute the sum of all K inputs through a communication network where each pair of nodes is connected by a parallel link with arbitrary bandwidth. The computation rate of All-Reduce is defined as the number of sum instances that can be computed over each network use. For the computation rate, we provide a cut-set upper bound and a linear programming lower bound based on time (bandwidth) sharing over all schemes that first perform Reduce (aggregating all inputs at one node) and then perform Broadcast (sending the sum from that node to all other nodes). Specializing the two general bounds gives us the optimal computation rate for a class of communication networks and the best-known rate bounds (where the upper bound is no more than twice of the lower bound) for cyclic, complete, and hypercube networks.
翻译:在All-Reduce问题中,K个节点各持有一个输入,希望通过通信网络计算所有K个输入的总和,其中每对节点之间通过具有任意带宽的并行链路连接。All-Reduce的计算速率定义为每次网络使用可计算的总和实例数量。针对计算速率,我们提出了一个割集上界和一个基于时间(带宽)共享的线性规划下界,该下界适用于所有先执行Reduce(将所有输入聚合至一个节点)再执行Broadcast(将总和从该节点发送至所有其他节点)的方案。通过将这两个一般性界应用于特定场景,我们得到了一类通信网络的最优计算速率,以及针对环形、完全图和超立方体网络的最佳已知速率界(其上界不超过下界的两倍)。