S2R: Exploring a Double-Win Transformer-Based Framework for Ideal and Blind Super-Resolution

Nowadays, deep learning based methods have demonstrated impressive performance on ideal super-resolution (SR) datasets, but most of these methods incur dramatically performance drops when directly applied in real-world SR reconstruction tasks with unpredictable blur kernels. To tackle this issue, blind SR methods are proposed to improve the visual results on random blur kernels, which causes unsatisfactory reconstruction effects on ideal low-resolution images similarly. In this paper, we propose a double-win framework for ideal and blind SR task, named S2R, including a light-weight transformer-based SR model (S2R transformer) and a novel coarse-to-fine training strategy, which can achieve excellent visual results on both ideal and random fuzzy conditions. On algorithm level, S2R transformer smartly combines some efficient and light-weight blocks to enhance the representation ability of extracted features with relatively low number of parameters. For training strategy, a coarse-level learning process is firstly performed to improve the generalization of the network with the help of a large-scale external dataset, and then, a fast fine-tune process is developed to transfer the pre-trained model to real-world SR tasks by mining the internal features of the image. Experimental results show that the proposed S2R outperforms other single-image SR models in ideal SR condition with only 578K parameters. Meanwhile, it can achieve better visual results than regular blind SR models in blind fuzzy conditions with only 10 gradient updates, which improve convergence speed by 300 times, significantly accelerating the transfer-learning process in real-world situations.

翻译：如今，基于深度学习的方法在理想超分辨率（SR）数据集上展现出令人瞩目的性能，但大多数此类方法在直接应用于包含不可预测模糊核的真实世界SR重建任务时，性能会急剧下降。为解决这一问题，盲SR方法被提出以改善随机模糊核下的视觉效果，但这类方法同样会导致理想低分辨率图像的重建效果不尽如人意。本文针对理想与盲SR任务提出了一种双赢框架S2R，包含轻量级基于Transformer的SR模型（S2R Transformer）及一种新颖的由粗到精的训练策略，可在理想和随机模糊条件下均取得卓越的视觉效果。在算法层面，S2R Transformer巧妙融合高效轻量级模块，以相对较低的参数量增强提取特征的表示能力。在训练策略方面，首先通过大规模外部数据集进行粗粒度学习过程以提高网络泛化能力；随后开发快速微调过程，通过挖掘图像内部特征将预训练模型迁移至真实世界SR任务。实验结果表明，所提出的S2R在理想SR条件下以仅578K参数优于其他单图像SR模型；同时在盲模糊条件下，仅需10次梯度更新即可达到优于常规盲SR模型的视觉效果，收敛速度提升300倍，显著加速了真实场景中的迁移学习过程。