Although many recent works have made advancements in the image restoration (IR) field, they often suffer from an excessive number of parameters. Another issue is that most Transformer-based IR methods focus only on either local or global features, leading to limited receptive fields or deficient parameter issues. To address these problems, we propose a lightweight IR network, Reciprocal Attention Mixing Transformer (RAMiT). It employs our proposed dimensional reciprocal attention mixing Transformer (D-RAMiT) blocks, which compute bi-dimensional (spatial and channel) self-attentions in parallel with different numbers of multi-heads. The bi-dimensional attentions help each other to complement their counterpart's drawbacks and are then mixed. Additionally, we introduce a hierarchical reciprocal attention mixing (H-RAMi) layer that compensates for pixel-level information losses and utilizes semantic information while maintaining an efficient hierarchical structure. Furthermore, we revisit and modify MobileNet V1 and V2 to attach efficient convolutions to our proposed components. The experimental results demonstrate that RAMiT achieves state-of-the-art performance on multiple lightweight IR tasks, including super-resolution, color denoising, grayscale denoising, low-light enhancement, and deraining. Codes will be available soon.
翻译:尽管近期许多工作在图像复原领域取得了进展,但它们往往存在参数过多的问题。另一个问题是,大多数基于Transformer的图像复原方法仅关注局部或全局特征,导致感受野受限或参数不足。为解决这些问题,我们提出了一种轻量化图像复原网络——互注意力混合Transformer(RAMiT)。该网络采用我们提出的维度互注意力混合Transformer(D-RAMiT)模块,该模块以不同数量的多头并行计算二维(空间和通道)自注意力。二维注意力相互辅助以弥补对方的缺陷,随后进行混合。此外,我们引入了层级互注意力混合(H-RAMi)层,该层在保持高效层级结构的同时补偿像素级信息损失并利用语义信息。进一步地,我们重新审视并修改了MobileNet V1和V2,将高效卷积附加到所提出的组件中。实验结果表明,RAMiT在多项轻量化图像复原任务(包括超分辨率、彩色去噪、灰度去噪、低光增强和去雨)上均达到了领先性能。代码将稍后发布。