Diffusion models have demonstrated exceptional success in video super-resolution (VSR), exhibiting powerful capabilities for generating fine-grained details. However, their potential for space-time video super-resolution (STVSR), which necessitates not only recovering realistic high-resolution visual content but also improving the frame rate with coherent temporal dynamics, remains largely underexplored. Moreover, existing STVSR methods predominantly address spatiotemporal upsampling under simple degradation assumptions, thus failing in real-world scenarios with complex unknown degradations. To address these challenges, we propose OSDEnhancer, the first framework that achieves robust STVSR in one-step diffusion. OSDEnhancer begins with a linear initialization to establish essential spatiotemporal structures and adapt the model for one-step reconstruction. It then applies a divide-and-conquer strategy, introducing the temporal coherence (TC) and texture enrichment (TE) LoRAs that progressively specialize in inter-frame dynamics modeling and fine-grained texture recovery, respectively, while collaborating during inference for enhanced overall performance. A bidirectional VAE decoder employs deformable recurrent blocks to leverage the multi-scale structure of the vanilla VAE, enhancing latent-to-pixel reconstruction through joint multi-scale deformable aggregation and inter-frame feature propagation. Experimental results demonstrate that the proposed method attains state-of-the-art performance with superior generalization in real-world scenarios. The code is available at https://github.com/W-Shuoyan/OSDEnhancer.
翻译:扩散模型在视频超分辨率(VSR)中取得了显著成功,展现出生成精细细节的强大能力。然而,其在时空视频超分辨率(STVSR)中的潜力——不仅需要恢复高分辨率视觉内容,还需提升帧率并保持连贯的时间动态——仍未被充分探索。此外,现有STVSR方法主要在简单退化假设下处理时空上采样,因此在面临复杂未知退化的真实场景中表现不佳。为解决这些挑战,我们提出了OSDEnhancer,这是首个在一步扩散中实现鲁棒STVSR的框架。OSDEnhancer首先通过线性初始化建立必要的时空结构,并适配模型进行一步重建。随后采用分治策略,引入时间一致性(TC)和纹理增强(TE)LoRA模块,分别专注于帧间动态建模和精细纹理恢复,同时推理时协同工作以提升整体性能。双向VAE解码器利用可变形循环模块,借助原始VAE的多尺度结构,通过联合多尺度可变形聚合与帧间特征传播增强潜变量到像素的重建。实验结果表明,所提方法在真实场景中达到了最先进性能,并具有优越的泛化能力。代码已开源至https://github.com/W-Shuoyan/OSDEnhancer。