We present Pyramid Attention Broadcast (PAB), a real-time, high quality and training-free approach for DiT-based video generation. Our method is founded on the observation that attention difference in the diffusion process exhibits a U-shaped pattern, indicating significant redundancy. We mitigate this by broadcasting attention outputs to subsequent steps in a pyramid style. It applies different broadcast strategies to each attention based on their variance for best efficiency. We further introduce broadcast sequence parallel for more efficient distributed inference. PAB demonstrates superior results across three models compared to baselines, achieving real-time generation for up to 720p videos. We anticipate that our simple yet effective method will serve as a robust baseline and facilitate future research and application for video generation.
翻译:本文提出金字塔注意力广播(Pyramid Attention Broadcast,PAB),一种用于基于DiT的视频生成的实时、高质量且无需训练的方法。我们的方法基于以下观察:扩散过程中的注意力差异呈现U形模式,表明存在显著冗余。我们通过以金字塔方式将注意力输出广播到后续步骤来缓解这一问题。该方法根据每个注意力的方差应用不同的广播策略以实现最佳效率。我们进一步引入广播序列并行以实现更高效的分布式推理。与基线方法相比,PAB在三种模型上均展现出优越的结果,实现了高达720p视频的实时生成。我们预期这种简单而有效的方法将成为稳健的基线,并促进视频生成的未来研究和应用。