Autoregressive models, often built on Transformer architectures, represent a powerful paradigm for generating ultra-long videos by synthesizing content in sequential chunks. However, this sequential generation process is notoriously slow. While caching strategies have proven effective for accelerating traditional video diffusion models, existing methods assume uniform denoising across all frames-an assumption that breaks down in autoregressive models where different video chunks exhibit varying similarity patterns at identical timesteps. In this paper, we present FlowCache, the first caching framework specifically designed for autoregressive video generation. Our key insight is that each video chunk should maintain independent caching policies, allowing fine-grained control over which chunks require recomputation at each timestep. We introduce a chunkwise caching strategy that dynamically adapts to the unique denoising characteristics of each chunk, complemented by a joint importance-redundancy optimized KV cache compression mechanism that maintains fixed memory bounds while preserving generation quality. Our method achieves remarkable speedups of 2.38 times on MAGI-1 and 6.7 times on SkyReels-V2, with negligible quality degradation (VBench: 0.87 increase and 0.79 decrease respectively). These results demonstrate that FlowCache successfully unlocks the potential of autoregressive models for real-time, ultra-long video generation-establishing a new benchmark for efficient video synthesis at scale. The code is available at https://github.com/mikeallen39/FlowCache.
翻译:自回归模型通常基于Transformer架构构建,通过按顺序合成内容块来生成超长视频,代表了一种强大的范式。然而,这种顺序生成过程速度极慢。虽然缓存策略已被证明能有效加速传统视频扩散模型,但现有方法假设所有帧在去噪过程中具有均匀性——这一假设在自回归模型中并不成立,因为不同的视频块在相同时间步会表现出不同的相似性模式。本文提出了FlowCache,这是首个专门为自回归视频生成设计的缓存框架。我们的核心洞见是:每个视频块应保持独立的缓存策略,从而实现对每个时间步哪些块需要重新计算的细粒度控制。我们引入了一种块级缓存策略,能动态适应每个块独特的去噪特性,并辅以联合重要性-冗余度优化的KV缓存压缩机制,在保持生成质量的同时维持固定的内存边界。我们的方法在MAGI-1上实现了2.38倍、在SkyReels-V2上实现了6.7倍的显著加速,且质量下降可忽略不计(VBench指标分别增加0.87和减少0.79)。这些结果表明,FlowCache成功释放了自回归模型在实时超长视频生成方面的潜力,为大规模高效视频合成建立了新基准。代码发布于https://github.com/mikeallen39/FlowCache。