Long context training is crucial for LLM's context extension. Existing schemes, such as sequence parallelism, incur substantial communication overhead. Pipeline parallelism (PP) reduces this cost, but its effectiveness hinges on partitioning granularity. Batch-level PP employing sequence packing exhibits high memory consumption in long-context scenarios, whereas token-level PP splitting sequences into slices alleviates memory overhead but may incur hardware under-utilization. Moreover, the skewed distribution of sequence length in real-world datasets renders monolithic and static granularity PP's sub-optimal performance. In this paper, we propose 1) \textit{Elastic Pipeline Parallelism} (EPP) that orchestrates token-level PP and batch-level PP to adapt to resource and workload heterogeneity, and 2) \textit{Stage-Aware Chunk-Level Adaptive Checkpointing} that efficiently integrates gradient checkpointing with EPP. Comprehensive experiments demonstrate that InfiniPipe achieves a 1.69x speedup over state-of-the-art systems. Our code is open-sourced at https://github.com/wsjdsg/InfiniPipe.git.
翻译:长上下文训练对LLM的上下文扩展至关重要。现有方案(如序列并行)会引入显著通信开销。流水线并行(PP)虽能降低该开销,但其有效性取决于划分粒度。采用序列打包的批级别PP在长上下文场景中内存消耗较高,而将序列切分为片段的令牌级别PP虽缓解了内存开销,却可能导致硬件利用率不足。此外,真实数据集中序列长度的偏态分布使得单一静态粒度的PP性能次优。本文提出:1)弹性流水线并行(EPP)——协调令牌级别PP与批级别PP以适应资源与工作负载异质性;2)阶段感知的块级自适应检查点——将梯度检查点与EPP高效集成。综合实验表明,InfiniPipe相比现有最优系统可实现1.69倍加速。我们的代码已开源至https://github.com/wsjdsg/InfiniPipe.git。