Diffusion Transformers are fundamental for video and image generation, but their efficiency is bottlenecked by the quadratic complexity of attention. While block sparse attention accelerates computation by attending only critical key-value blocks, it suffers from degradation at high sparsity by discarding context. In this work, we discover that attention scores of non-critical blocks exhibit distributional stability, allowing them to be approximated accurately and efficiently rather than discarded, which is essentially important for sparse attention design. Motivated by this key insight, we propose PISA, a training-free Piecewise Sparse Attention that covers the full attention span with sub-quadratic complexity. Unlike the conventional keep-or-drop paradigm that directly drop the non-critical block information, PISA introduces a novel exact-or-approximate strategy: it maintains exact computation for critical blocks while efficiently approximating the remainder through block-wise Taylor expansion. This design allows PISA to serve as a faithful proxy to full attention, effectively bridging the gap between speed and quality. Experimental results demonstrate that PISA achieves 1.91 times and 2.57 times speedups on Wan2.1-14B and Hunyuan-Video, respectively, while consistently maintaining the highest quality among sparse attention methods. Notably, even for image generation on FLUX, PISA achieves a 1.2 times acceleration without compromising visual quality. Code is available at: https://github.com/xie-lab-ml/piecewise-sparse-attention.
翻译:扩散Transformer是视频和图像生成的基础,但其效率受限于注意力机制的二次复杂度。虽然块稀疏注意力通过仅关注关键键值块来加速计算,但在高稀疏度下会因丢弃上下文信息而导致性能下降。本研究发现,非关键块的注意力分数具有分布稳定性,使其能够被准确高效地近似而非直接丢弃,这一发现对稀疏注意力设计至关重要。基于这一关键洞见,我们提出PISA——一种无需训练的分段稀疏注意力机制,能够以次二次复杂度覆盖完整的注意力范围。不同于传统“保留或丢弃”范式直接舍弃非关键块信息,PISA引入了创新的“精确计算或近似处理”策略:对关键块保持精确计算,同时通过分块泰勒展开高效近似其余部分。该设计使PISA能够作为完整注意力的可靠代理,有效弥合速度与质量之间的鸿沟。实验结果表明,在Wan2.1-14B和Hunyuan-Video模型上,PISA分别实现了1.91倍和2.57倍的加速,同时在稀疏注意力方法中始终保持最高生成质量。值得注意的是,即使在FLUX模型上进行图像生成,PISA也能实现1.2倍加速且不损失视觉质量。代码已开源:https://github.com/xie-lab-ml/piecewise-sparse-attention。