Transforming Image Super-Resolution: A ConvFormer-based Efficient Approach

Recent progress in single-image super-resolution (SISR) has achieved remarkable performance, yet the computational costs of these methods remain a challenge for deployment on resource-constrained devices. Especially for transformer-based methods, the self-attention mechanism in such models brings great breakthroughs while incurring substantial computational costs. To tackle this issue, we introduce the Convolutional Transformer layer (ConvFormer) and the ConvFormer-based Super-Resolution network (CFSR), which offer an effective and efficient solution for lightweight image super-resolution tasks. In detail, CFSR leverages the large kernel convolution as the feature mixer to replace the self-attention module, efficiently modeling long-range dependencies and extensive receptive fields with a slight computational cost. Furthermore, we propose an edge-preserving feed-forward network, simplified as EFN, to obtain local feature aggregation and simultaneously preserve more high-frequency information. Extensive experiments demonstrate that CFSR can achieve an advanced trade-off between computational cost and performance when compared to existing lightweight SR methods. Compared to state-of-the-art methods, e.g. ShuffleMixer, the proposed CFSR achieves 0.39 dB gains on Urban100 dataset for x2 SR task while containing 26% and 31% fewer parameters and FLOPs, respectively. Code and pre-trained models are available at https://github.com/Aitical/CFSR.

翻译：近期在单图像超分辨率（SISR）领域取得了显著进展，但这些方法的计算成本仍对资源受限设备上的部署构成挑战。尤其是基于Transformer的方法中，自注意力机制在带来重大突破的同时也引入了高昂的计算开销。为解决此问题，我们提出了卷积Transformer层（ConvFormer）及基于ConvFormer的超分辨率网络（CFSR），为轻量级图像超分辨率任务提供了高效且有效的解决方案。具体而言，CFSR利用大核卷积作为特征混合器替代自注意力模块，以较低的计算成本高效建模长距离依赖关系并扩展感受野。此外，我们提出了一种边缘保持前馈网络（简称为EFN），用于局部特征聚合的同时保留更多高频信息。大量实验表明，与现有轻量级超分辨率方法相比，CFSR能够在计算成本与性能之间实现更优的权衡。与最先进的ShuffleMixer等方法相比，所提CFSR在Urban100数据集上用于x2超分辨率任务时获得0.39dB的性能提升，同时参数数量和计算量分别减少26%和31%。代码与预训练模型已开源在https://github.com/Aitical/CFSR。

相关内容