While lightweight ViT framework has made tremendous progress in image super-resolution, its uni-dimensional self-attention modeling, as well as homogeneous aggregation scheme, limit its effective receptive field (ERF) to include more comprehensive interactions from both spatial and channel dimensions. To tackle these drawbacks, this work proposes two enhanced components under a new Omni-SR architecture. First, an Omni Self-Attention (OSA) block is proposed based on dense interaction principle, which can simultaneously model pixel-interaction from both spatial and channel dimensions, mining the potential correlations across omni-axis (i.e., spatial and channel). Coupling with mainstream window partitioning strategies, OSA can achieve superior performance with compelling computational budgets. Second, a multi-scale interaction scheme is proposed to mitigate sub-optimal ERF (i.e., premature saturation) in shallow models, which facilitates local propagation and meso-/global-scale interactions, rendering an omni-scale aggregation building block. Extensive experiments demonstrate that Omni-SR achieves record-high performance on lightweight super-resolution benchmarks (e.g., 26.95 dB@Urban100 $\times 4$ with only 792K parameters). Our code is available at \url{https://github.com/Francis0625/Omni-SR}.
翻译:尽管轻量级ViT框架在图像超分辨率领域取得了显著进展,但其单一维度的自注意力建模及同质化聚合机制,限制了有效感受野(ERF)从空间和通道维度纳入更全面的交互。为解决上述缺陷,本文在新颖的Omni-SR架构下提出了两项增强组件。首先,基于密集交互原理设计了全维度自注意力(OSA)模块,可同时从空间和通道维度对像素交互进行建模,挖掘全轴(即空间与通道)间的潜在关联。结合主流窗口划分策略,OSA能在合理的计算预算下实现卓越性能。其次,提出多尺度交互机制以缓解浅层模型中的次优ERF(即过早饱和)问题,通过促进局部传播及中/全局尺度交互,构建了全尺度聚合构建模块。大量实验表明,Omni-SR在轻量级超分辨率基准测试中创下了记录级性能(例如Urban100 $\times4$ 仅需792K参数即可达26.95 dB)。代码已开源在 \url{https://github.com/Francis0625/Omni-SR}。