Although many recent works have made advancements in the image restoration (IR) field, they often suffer from an excessive number of parameters. Another issue is that most Transformer-based IR methods focus only on either local or global features, leading to limited receptive fields or deficient parameter issues. To address these problems, we propose a lightweight IR network, Reciprocal Attention Mixing Transformer (RAMiT). It employs our proposed dimensional reciprocal attention mixing Transformer (D-RAMiT) blocks, which compute bi-dimensional (spatial and channel) self-attentions in parallel with different numbers of multi-heads. The bi-dimensional attentions help each other to complement their counterpart's drawbacks and are then mixed. Additionally, we introduce a hierarchical reciprocal attention mixing (H-RAMi) layer that compensates for pixel-level information losses and utilizes semantic information while maintaining an efficient hierarchical structure. Furthermore, we revisit and modify MobileNet V1 and V2 to attach efficient convolutions to our proposed components. The experimental results demonstrate that RAMiT achieves state-of-the-art performance on multiple lightweight IR tasks, including super-resolution, color denoising, grayscale denoising, low-light enhancement, and deraining. Codes are available at https://github.com/rami0205/RAMiT.
翻译:尽管近期许多工作在图像恢复领域取得了进展,但往往存在参数过多的问题。另一个问题是,大多数基于Transformer的IR方法仅关注局部或全局特征,导致感受野受限或参数不足。为解决这些问题,我们提出了一种轻量化IR网络——互注意力混合Transformer(RAMiT)。该网络采用我们提出的维度互注意力混合Transformer(D-RAMiT)块,这些块以不同数量的多头并行计算二维(空间和通道)自注意力。二维注意力相互协作以弥补彼此的缺陷,随后进行混合。此外,我们引入了一种分层互注意力混合(H-RAMi)层,该层在保持高效分层结构的同时,补偿像素级信息损失并利用语义信息。进一步地,我们重新审视并修改了MobileNet V1和V2,将高效卷积附加到所提出的组件中。实验结果表明,RAMiT在多个轻量化IR任务(包括超分辨率、彩色去噪、灰度去噪、低光增强和去雨)上均达到了最先进性能。代码可在https://github.com/rami0205/RAMiT获取。