Although numerous solutions have been proposed for image super-resolution, they are usually incompatible with low-power devices with many computational and memory constraints. In this paper, we address this problem by proposing a simple yet effective deep network to solve image super-resolution efficiently. In detail, we develop a spatially-adaptive feature modulation (SAFM) mechanism upon a vision transformer (ViT)-like block. Within it, we first apply the SAFM block over input features to dynamically select representative feature representations. As the SAFM block processes the input features from a long-range perspective, we further introduce a convolutional channel mixer (CCM) to simultaneously extract local contextual information and perform channel mixing. Extensive experimental results show that the proposed method is $3\times$ smaller than state-of-the-art efficient SR methods, e.g., IMDN, in terms of the network parameters and requires less computational cost while achieving comparable performance. The code is available at https://github.com/sunny2109/SAFMN.
翻译:尽管已有众多图像超分辨率解决方案被提出,但它们通常无法兼容于计算和内存资源受限的低功耗设备。针对这一问题,本文提出了一种简单而高效的深度网络,以实现高效的图像超分辨率。具体而言,我们在类视觉Transformer(ViT)模块的基础上,开发了一种空间自适应特征调制(SAFM)机制。在该机制中,我们首先对输入特征应用SAFM模块,以动态选择具有代表性的特征表示。由于SAFM模块从长程视角处理输入特征,我们进一步引入卷积通道混合器(CCM),用于同时提取局部上下文信息并执行通道混合。大量实验结果表明,所提方法在网络参数规模上比现有高效超分辨率方法(如IMDN)小3倍,且计算成本更低,同时性能相当。代码开源于 https://github.com/sunny2109/SAFMN。