Large-scale language models have become increasingly challenging and expensive to train. Among various methods addressing this issue, Pipeline Parallelism has been widely employed to accommodate massive model weights within limited GPU memory. This paper introduces Hanayo, a wave-like pipeline parallelism strategy that boasts a concise structure and practical applicability, alongside a high-performance pipeline execution runtime to tackle the challenges of pipeline strategy implementation. Hanayo mitigates the issues of pipeline bubbles and excessive memory consumption prevalent in existing schemes, without resorting to model duplicates as in Chimera. Our evaluation, conducted on four distinct computing clusters and involving both GPT-like and BERT-like architectures with up to 32 GPUs, demonstrates up to a 30.4 \% increase in throughput compared to the state-of-the-art approach.
翻译:大规模语言模型的训练日益面临挑战且成本高昂。在解决该问题的诸多方法中,流水线并行性已被广泛用于在有限GPU内存中容纳海量模型权重。本文提出Hanayo——一种结构简洁且具备实用性的波状流水线并行策略,并配套高性能流水线执行运行时,以应对流水线策略实施中的挑战。Hanayo在不采用Chimera方案中的模型副本前提下,缓解了现有方案中存在的流水线气泡与内存消耗过高问题。我们基于四个不同计算集群,在最多32块GPU上针对GPT类与BERT类架构进行的评估表明,相较于现有最优方法,吞吐量提升最高达30.4%。