Model parallelism has become necessary to train large neural networks. However, finding a suitable model parallel schedule for an arbitrary neural network is a non-trivial task due to the exploding search space. In this work, we present a model parallelism framework TAP that automatically searches for the best data and tensor parallel schedules. Leveraging the key insight that a neural network can be represented as a directed acyclic graph, within which may only exist a limited set of frequent subgraphs, we design a graph pruning algorithm to fold the search space efficiently. TAP runs at sub-linear complexity concerning the neural network size. Experiments show that TAP is $20\times- 160\times$ faster than the state-of-the-art automatic parallelism framework, and the performance of its discovered schedules is competitive with the expert-engineered ones.
翻译:模型并行化已成为训练大型神经网络的必要手段。然而,由于搜索空间呈爆炸式增长,为任意神经网络寻找合适的模型并行化策略并非易事。本文提出了一种模型并行框架TAP,可自动搜索最优的数据并行和张量并行策略。基于神经网络可表示为有向无环图且其中可能仅存在有限频繁子图的关键发现,我们设计了一种图剪枝算法以高效折叠搜索空间。TAP的复杂度与神经网络规模呈次线性关系。实验表明,TAP的运行速度比当前最先进的自动并行框架快20至160倍,且其发现的策略性能与专家设计的策略相当。