Diffusion models are pivotal for generating high-quality images and videos. Inspired by the success of OpenAI's Sora, the backbone of diffusion models is evolving from U-Net to Transformer, known as Diffusion Transformers (DiTs). However, generating high-quality content necessitates longer sequence lengths, exponentially increasing the computation required for the attention mechanism, and escalating DiTs inference latency. Parallel inference is essential for real-time DiTs deployments, but relying on a single parallel method is impractical due to poor scalability at large scales. This paper introduces xDiT, a comprehensive parallel inference engine for DiTs. After thoroughly investigating existing DiTs parallel approaches, xDiT chooses Sequence Parallel (SP) and PipeFusion, a novel Patch-level Pipeline Parallel method, as intra-image parallel strategies, alongside CFG parallel for inter-image parallelism. xDiT can flexibly combine these parallel approaches in a hybrid manner, offering a robust and scalable solution. Experimental results on two 8xL40 GPUs (PCIe) nodes interconnected by Ethernet and an 8xA100 (NVLink) node showcase xDiT's exceptional scalability across five state-of-the-art DiTs. Notably, we are the first to demonstrate DiTs scalability on Ethernet-connected GPU clusters. xDiT is available at https://github.com/xdit-project/xDiT.
翻译:扩散模型是生成高质量图像和视频的关键技术。受OpenAI Sora成功的启发,扩散模型的骨干网络正从U-Net向Transformer演进,形成所谓的扩散Transformer(DiTs)。然而,生成高质量内容需要更长的序列长度,这导致注意力机制所需计算量呈指数级增长,并显著增加了DiTs的推理延迟。并行推理对于实时部署DiTs至关重要,但由于大规模场景下可扩展性不足,依赖单一并行方法并不现实。本文提出xDiT,一个面向DiTs的综合性并行推理引擎。在深入研究现有DiTs并行方法后,xDiT选择序列并行(SP)与新颖的补丁级流水线并行方法PipeFusion作为图像内并行策略,并结合CFG并行实现图像间并行。xDiT能够以混合方式灵活组合这些并行方法,提供稳健且可扩展的解决方案。在通过以太网互连的两个8×L40 GPU(PCIe)节点和一个8×A100(NVLink)节点上进行的实验表明,xDiT在五种先进DiTs模型上展现出卓越的可扩展性。值得注意的是,我们首次在以太网连接的GPU集群上验证了DiTs的可扩展性。xDiT项目地址为https://github.com/xdit-project/xDiT。