Recent progress in single-image super-resolution (SISR) has achieved remarkable performance, yet the computational costs of these methods remain a challenge for deployment on resource-constrained devices. Especially for transformer-based methods, the self-attention mechanism in such models brings great breakthroughs while incurring substantial computational costs. To tackle this issue, we introduce the Convolutional Transformer layer (ConvFormer) and the ConvFormer-based Super-Resolution network (CFSR), which offer an effective and efficient solution for lightweight image super-resolution tasks. In detail, CFSR leverages the large kernel convolution as the feature mixer to replace the self-attention module, efficiently modeling long-range dependencies and extensive receptive fields with a slight computational cost. Furthermore, we propose an edge-preserving feed-forward network, simplified as EFN, to obtain local feature aggregation and simultaneously preserve more high-frequency information. Extensive experiments demonstrate that CFSR can achieve an advanced trade-off between computational cost and performance when compared to existing lightweight SR methods. Compared to state-of-the-art methods, e.g. ShuffleMixer, the proposed CFSR achieves 0.39 dB gains on Urban100 dataset for x2 SR task while containing 26% and 31% fewer parameters and FLOPs, respectively. Code and pre-trained models are available at https://github.com/Aitical/CFSR.
翻译:近期在单图像超分辨率(SISR)领域取得了显著进展,但这些方法的计算成本仍对资源受限设备上的部署构成挑战。尤其是基于Transformer的方法中,自注意力机制在带来重大突破的同时也引入了高昂的计算开销。为解决此问题,我们提出了卷积Transformer层(ConvFormer)及基于ConvFormer的超分辨率网络(CFSR),为轻量级图像超分辨率任务提供了高效且有效的解决方案。具体而言,CFSR利用大核卷积作为特征混合器替代自注意力模块,以较低的计算成本高效建模长距离依赖关系并扩展感受野。此外,我们提出了一种边缘保持前馈网络(简称为EFN),用于局部特征聚合的同时保留更多高频信息。大量实验表明,与现有轻量级超分辨率方法相比,CFSR能够在计算成本与性能之间实现更优的权衡。与最先进的ShuffleMixer等方法相比,所提CFSR在Urban100数据集上用于x2超分辨率任务时获得0.39dB的性能提升,同时参数数量和计算量分别减少26%和31%。代码与预训练模型已开源在https://github.com/Aitical/CFSR。