Effective aggregation of temporal information of consecutive frames is the core of achieving video super-resolution. Many scholars have utilized structures such as sliding windows and recurrent to gather spatio-temporal information of frames. However, although the performance of the constructed VSR models is improving, the size of the models is also increasing, exacerbating the demand on the equipment. Thus, to reduce the stress on the device, we propose a novel lightweight recurrent grouping attention network. The parameters of this model are only 0.878M, which is much lower than the current mainstream model for studying video super-resolution. We design forward feature extraction module and backward feature extraction module to collect temporal information between consecutive frames from two directions. Moreover, a new grouping mechanism is proposed to efficiently collect spatio-temporal information of the reference frame and its neighboring frames. The attention supplementation module is presented to further enhance the information gathering range of the model. The feature reconstruction module aims to aggregate information from different directions to reconstruct high-resolution features. Experiments demonstrate that our model achieves state-of-the-art performance on multiple datasets.
翻译:有效聚合连续帧的时间信息是实现视频超分辨率的核心。许多学者利用滑动窗口和循环结构来收集帧的时空信息。然而,虽然构建的视频超分辨率模型性能不断提升,但模型规模也在增加,加剧了对设备的需求。因此,为减轻设备压力,我们提出了一种新颖的轻量级循环分组注意力网络。该模型的参数仅0.878M,远低于当前视频超分辨率研究的主流模型。我们设计了前向特征提取模块和后向特征提取模块,从两个方向收集连续帧之间的时间信息。此外,提出了一种新的分组机制,以高效收集参考帧及其相邻帧的时空信息。注意力补充模块用于进一步增强模型的信息收集范围。特征重建模块旨在聚合来自不同方向的信息以重建高分辨率特征。实验表明,我们的模型在多个数据集上达到了最先进的性能。