The Space-Time Video Super-Resolution (STVSR) task aims to enhance the visual quality of videos, by simultaneously performing video frame interpolation (VFI) and video super-resolution (VSR). However, facing the challenge of the additional temporal dimension and scale inconsistency, most existing STVSR methods are complex and inflexible in dynamically modeling different motion amplitudes. In this work, we find that choosing an appropriate processing scale achieves remarkable benefits in flow-based feature propagation. We propose a novel Scale-Adaptive Feature Aggregation (SAFA) network that adaptively selects sub-networks with different processing scales for individual samples. Experiments on four public STVSR benchmarks demonstrate that SAFA achieves state-of-the-art performance. Our SAFA network outperforms recent state-of-the-art methods such as TMNet and VideoINR by an average improvement of over 0.5dB on PSNR, while requiring less than half the number of parameters and only 1/3 computational costs.
翻译:时空视频超分辨率(STVSR)任务旨在通过同时执行视频帧插值(VFI)和视频超分辨率(VSR)来提升视频的视觉质量。然而,面对额外时间维度和尺度不一致性的挑战,现有大多数STVSR方法在动态建模不同运动幅度时复杂且缺乏灵活性。本研究发现,选择适当的处理尺度能在基于光流的特征传播中带来显著优势。我们提出了一种新颖的尺度自适应特征聚合(SAFA)网络,该网络能够为每个样本自适应地选择不同处理尺度的子网络。在四个公开STVSR基准上的实验表明,SAFA达到了最先进的性能。我们的SAFA网络在PSNR上平均提升超过0.5dB,优于TMNet和VideoINR等最新方法,同时参数量不到后者的一半,计算成本仅为1/3。