Autoregressive models have emerged as a powerful approach for visual generation but suffer from slow inference speed due to their sequential token-by-token prediction process. In this paper, we propose a simple yet effective approach for parallelized autoregressive visual generation that improves generation efficiency while preserving the advantages of autoregressive modeling. Our key insight is that parallel generation depends on visual token dependencies-tokens with weak dependencies can be generated in parallel, while strongly dependent adjacent tokens are difficult to generate together, as their independent sampling may lead to inconsistencies. Based on this observation, we develop a parallel generation strategy that generates distant tokens with weak dependencies in parallel while maintaining sequential generation for strongly dependent local tokens. Our approach can be seamlessly integrated into standard autoregressive models without modifying the architecture or tokenizer. Experiments on ImageNet and UCF-101 demonstrate that our method achieves a 3.6x speedup with comparable quality and up to 9.5x speedup with minimal quality degradation across both image and video generation tasks. We hope this work will inspire future research in efficient visual generation and unified autoregressive modeling. Project page: https://epiphqny.github.io/PAR-project.
翻译:自回归模型已成为视觉生成的一种强大方法,但由于其逐令牌的顺序预测过程,推理速度较慢。本文提出了一种简单而有效的并行化自回归视觉生成方法,在保持自回归建模优势的同时提高了生成效率。我们的核心见解是:并行生成依赖于视觉令牌间的依赖关系——依赖关系较弱的令牌可以并行生成,而依赖关系较强的相邻令牌则难以同时生成,因为它们的独立采样可能导致不一致性。基于这一观察,我们开发了一种并行生成策略,对依赖关系较弱的远距离令牌进行并行生成,同时对依赖关系较强的局部令牌保持顺序生成。我们的方法可以无缝集成到标准自回归模型中,无需修改架构或分词器。在ImageNet和UCF-101数据集上的实验表明,在图像和视频生成任务中,我们的方法在保持相当质量的情况下实现了3.6倍的加速,在质量轻微下降的情况下可实现高达9.5倍的加速。我们希望这项工作能启发未来在高效视觉生成和统一自回归建模方面的研究。项目页面:https://epiphqny.github.io/PAR-project。