While recent autoregressive video diffusion models achieve remarkable streaming quality, they remain confined to low resolutions (e.g., 480P), leaving efficient, scalable, real-time high-resolution video generation a fundamental open challenge. To bridge this gap, we present Ultra Flash, a cascaded streaming framework capable of real-time high-resolution video generation. Ultra Flash achieves ~30 FPS at 1K resolution and ~18 FPS at 2K resolution on a single GPU through three key contributions: (1) an architecture-preserving T2V-to-TV2V super-resolution training paradigm coupled with an AIGC-oriented data degradation pipeline that effectively preserves the generative capability of the base model, enabling enhanced high-resolution detail when cascaded after mainstream low-resolution generative models; (2) a causal streaming latent upsampler paired with a high-resolution decoder, which enhances spatiotemporal coherence while enabling efficient latent spatial scaling and precise high-resolution decoding with negligible computational overhead; and (3) a cascade high-resolution streaming video generation optimization scheme that first performs hybrid-reward-enhanced sparse causalization and single-step distillation of the super-resolution model, then introduces cascaded streaming self-forcing preference optimization with dynamic cache management, jointly enhancing overall coherence, improving quality, and enabling real-time high-resolution streaming video generation. Extensive experiments demonstrate that Ultra Flash reliably produces ultra-high-resolution streaming video while maintaining state-of-the-art visual quality and superior efficiency. Project Page: https://xin1u.github.io/UltraFlash/
翻译:尽管近期自回归视频扩散模型在流式生成质量上表现卓越,但其分辨率仍局限于低画质(如480P),这使得高效、可扩展的实时高分辨率视频生成成为基础性开放挑战。为填补这一空白,我们提出超闪——一种级联式流式框架,能够实现实时高分辨率视频生成。在单GPU上,超闪在1K分辨率下达到约30 FPS,在2K分辨率下达到约18 FPS。其核心贡献包括三方面:(1)一种保持架构的T2V到TV2V超分辨率训练范式,结合面向AIGC的数据退化流程,有效保留基础模型的生成能力,从而在级联于主流低分辨率生成模型后增强高分辨率细节;(2)一种因果流式潜在上采样器与高分辨率解码器的组合,在增强时空一致性的同时,实现高效的潜在空间缩放和精确的高分辨率解码,且计算开销可忽略;(3)一种级联高分辨率流式视频生成优化方案:首先对超分辨率模型进行混合奖励增强的稀疏因果化与单步蒸馏,随后引入带动态缓存管理的级联流式自强制偏好优化,共同提升整体连贯性、改善画质,并实现实时高分辨率流式视频生成。大量实验表明,超闪在保持顶尖视觉质量与卓越效率的同时,能够可靠生成超高清流式视频。项目主页:https://xin1u.github.io/UltraFlash/