The Space-Time Video Super-Resolution (STVSR) task aims to enhance the visual quality of videos, by simultaneously performing video frame interpolation (VFI) and video super-resolution (VSR). However, facing the challenge of the additional temporal dimension and scale inconsistency, most existing STVSR methods are complex and inflexible in dynamically modeling different motion amplitudes. In this work, we find that choosing an appropriate processing scale achieves remarkable benefits in flow-based feature propagation. We propose a novel Scale-Adaptive Feature Aggregation (SAFA) network that adaptively selects sub-networks with different processing scales for individual samples. Experiments on four public STVSR benchmarks demonstrate that SAFA achieves state-of-the-art performance. Our SAFA network outperforms recent state-of-the-art methods such as TMNet and VideoINR by an average improvement of over 0.5dB on PSNR, while requiring less than half the number of parameters and only 1/3 computational costs.
翻译:时空视频超分辨率(STVSR)任务旨在通过同时执行视频帧插值(VFI)和视频超分辨率(VSR)来提升视频的视觉质量。然而,面对额外的时间维度和尺度不一致性的挑战,现有大多数STVSR方法在动态建模不同运动幅度时表现复杂且缺乏灵活性。本研究发现,选择合适的处理尺度在基于光流的特征传播中能够带来显著优势。我们提出了一种新颖的尺度自适应特征聚合(SAFA)网络,能够为每个样本自适应地选择具有不同处理尺度的子网络。在四个公开STVSR基准测试上的实验表明,SAFA达到了最先进的性能。我们的SAFA网络在PSNR指标上平均提升超过0.5dB,优于近期最先进的方法如TMNet和VideoINR,同时参数数量不到后者的一半,计算成本仅为其1/3。