Visual autoregressive (AR) generation models have demonstrated strong potential for image generation, yet their next-token-prediction paradigm introduces considerable inference latency. Although speculative decoding (SD) has been proven effective for accelerating visual AR models, its "draft one step, then verify one step" paradigm prevents a direct reduction in the number of forward passes, limiting its acceleration potential. Motivated by the interchangeability of visual tokens, we explore verification skipping in the SD process for the first time to explicitly cut the number of target model forward passes, thereby reducing inference latency. By analyzing the characteristics of the drafting stage, we observe that verification redundancy and stale feature reusability are key factors to maintain generation quality while improving speed for verification-free steps. Inspired by these two observations, we propose a novel SD framework VVS to accelerate visual AR model via partial verification skipping, which integrates three complementary modules: (1) a verification-free token selector with dynamic truncation, (2) token-level feature caching and reuse, and (3) fine-grained skipped step scheduling. Consequently, VVS reduces the number of target model forward passes by $2.8\times$ relative to vanilla AR decoding while maintaining competitive generation quality, offering a superior speed-quality trade-off over conventional SD frameworks and revealing strong potential to reshape the SD paradigm. Our code is available at https://github.com/HyattDD/VVS.
翻译:视觉自回归生成模型在图像生成中展现出强大潜力,但其逐令牌预测范式带来了显著的推理延迟。尽管推测解码已被证明能有效加速视觉自回归模型,但其"先草拟一步,再验证一步"的范式无法直接减少前向传播次数,从而限制了加速潜力。受视觉令牌可互换性的启发,我们首次探索在推测解码过程中跳过验证环节,以显式减少目标模型的前向传播次数,进而降低推理延迟。通过分析草拟阶段的特征,我们观察到验证冗余与陈旧特征复用是维持生成质量的同时提升无验证步骤速度的关键因素。基于这两点观察,我们提出一种新型推测解码框架VVS,通过局部验证跳过加速视觉自回归模型。该框架集成三个互补模块:(1) 带动态截断的无验证令牌选择器,(2) 令牌级特征缓存与复用,以及(3) 细粒度跳过步骤调度。最终,VVS将目标模型前向传播次数减少至原始自回归解码的2.8倍,同时保持竞争力的生成质量,相比传统推测解码框架实现了更优的速度-质量权衡,并展现出重塑推测解码范式的巨大潜力。我们的代码开源在https://github.com/HyattDD/VVS。