The Space-Time Video Super-Resolution (STVSR) task aims to enhance the visual quality of videos, by simultaneously performing video frame interpolation (VFI) and video super-resolution (VSR). However, facing the challenge of the additional temporal dimension and scale inconsistency, most existing STVSR methods are complex and inflexible in dynamically modeling different motion amplitudes. In this work, we find that choosing an appropriate processing scale achieves remarkable benefits in flow-based feature propagation. We propose a novel Scale-Adaptive Feature Aggregation (SAFA) network that adaptively selects sub-networks with different processing scales for individual samples. Experiments on four public STVSR benchmarks demonstrate that SAFA achieves state-of-the-art performance. Our SAFA network outperforms recent state-of-the-art methods such as TMNet and VideoINR by an average improvement of over 0.5dB on PSNR, while requiring less than half the number of parameters and only 1/3 computational costs.
翻译:时空视频超分辨率(STVSR)任务旨在通过同时执行视频帧插值(VFI)和视频超分辨率(VSR)来提升视频的视觉质量。然而,面对额外时间维度与尺度不一致性的挑战,现有多数STVSR方法在动态建模不同运动幅度时显得复杂且缺乏灵活性。本研究发现,在基于光流的特征传播中,选择适当的处理尺度能带来显著效益。为此我们提出新型尺度自适应特征聚合(SAFA)网络,该网络能够为每个样本自适应地选择不同处理尺度的子网络。在四个公开STVSR基准测试上的实验表明,SAFA达到了当前最优性能。与TMNet和VideoINR等最新方法相比,我们的SAFA网络在PSNR指标上平均提升超过0.5dB,同时参数数量不到其一半,计算成本仅需1/3。