In this paper, we address the problem of video super-resolution (VSR) using Diffusion Models (DM), and present StableVSR. Our method significantly enhances the perceptual quality of upscaled videos by synthesizing realistic and temporally-consistent details. We turn a pre-trained DM for single image super-resolution into a VSR method by introducing the Temporal Conditioning Module (TCM). TCM uses Temporal Texture Guidance, which provides spatially-aligned and detail-rich texture information synthesized in adjacent frames. This guides the generative process of the current frame toward high-quality and temporally-consistent results. We introduce a Frame-wise Bidirectional Sampling strategy to encourage the use of information from past to future and vice-versa. This strategy improves the perceptual quality of the results and the temporal consistency across frames. We demonstrate the effectiveness of StableVSR in enhancing the perceptual quality of upscaled videos compared to existing state-of-the-art methods for VSR. The code is available at https://github.com/claudiom4sir/StableVSR.
翻译:本文针对利用扩散模型(DM)解决视频超分辨率(VSR)问题,提出StableVSR方法。该方法通过合成逼真且时间一致的细节,显著提升放大后视频的感知质量。我们引入时间条件模块(TCM),将预训练的单图像超分辨率扩散模型转化为VSR方法。TCM采用时间纹理引导策略,提供相邻帧间空间对齐且细节丰富的纹理信息,从而引导当前帧的生成过程产生高质量且时间一致的结果。我们提出帧间双向采样策略,促进过去与未来帧之间的信息双向流动,该策略在提升结果感知质量的同时增强帧间时间一致性。实验表明,与现有最先进VSR方法相比,StableVSR在增强放大视频感知质量方面具有显著优势。代码开源地址:https://github.com/claudiom4sir/StableVSR