While some studies have proven that Swin Transformer (SwinT) with window self-attention (WSA) is suitable for single image super-resolution (SR), SwinT ignores the broad regions for reconstructing high-resolution images due to window and shift size. In addition, many deep learning SR methods suffer from intensive computations. To address these problems, we introduce the N-Gram context to the image domain for the first time in history. We define N-Gram as neighboring local windows in SwinT, which differs from text analysis that views N-Gram as consecutive characters or words. N-Grams interact with each other by sliding-WSA, expanding the regions seen to restore degraded pixels. Using the N-Gram context, we propose NGswin, an efficient SR network with SCDP bottleneck taking all outputs of the hierarchical encoder. Experimental results show that NGswin achieves competitive performance while keeping an efficient structure, compared with previous leading methods. Moreover, we also improve other SwinT-based SR methods with the N-Gram context, thereby building an enhanced model: SwinIR-NG. Our improved SwinIR-NG outperforms the current best lightweight SR approaches and establishes state-of-the-art results. Codes will be available soon.
翻译:尽管已有研究证明基于窗口自注意力(WSA)的Swin Transformer(SwinT)适用于单图像超分辨率(SR),但由于窗口和偏移尺寸的限制,SwinT在重建高分辨率图像时忽略了广阔区域。此外,许多深度学习超分辨率方法存在计算密集的问题。为解决这些难题,我们首次将N-Gram上下文引入图像领域。我们将N-Gram定义为SwinT中相邻的局部窗口,这与文本分析中将N-Gram视为连续字符或词汇的方式不同。通过滑动窗口自注意力机制实现N-Gram间交互,从而扩展退化像素的恢复视野。基于N-Gram上下文,我们提出NGswin——一种高效超分辨率网络,其采用SCDP瓶颈结构处理分层编码器的全部输出。实验结果表明,与现有领先方法相比,NGswin在保持高效结构的同时实现了具有竞争力的性能。此外,我们还利用N-Gram上下文改进了其他基于SwinT的超分辨率方法,构建出增强模型SwinIR-NG。改进后的SwinIR-NG超越了当前最优的轻量级超分辨率方法,达到了最先进的性能水平。相关代码即将开源。