Although many recent works have made advancements in the image restoration (IR) field, they often suffer from an excessive number of parameters. Another issue is that most Transformer-based IR methods focus only on either local or global features, leading to limited receptive fields or deficient parameter issues. To address these problems, we propose a lightweight IR network, Reciprocal Attention Mixing Transformer (RAMiT). It employs our proposed dimensional reciprocal attention mixing Transformer (D-RAMiT) blocks, which compute bi-dimensional (spatial and channel) self-attentions in parallel with different numbers of multi-heads. The bi-dimensional attentions help each other to complement their counterpart's drawbacks and are then mixed. Additionally, we introduce a hierarchical reciprocal attention mixing (H-RAMi) layer that compensates for pixel-level information losses and utilizes semantic information while maintaining an efficient hierarchical structure. Furthermore, we revisit and modify MobileNet V1 and V2 to attach efficient convolutions to our proposed components. The experimental results demonstrate that RAMiT achieves state-of-the-art performance on multiple lightweight IR tasks, including super-resolution, color denoising, grayscale denoising, low-light enhancement, and deraining. Codes are available at https://github.com/rami0205/RAMiT.
翻译:尽管近年来许多工作在图像复原(IR)领域取得了进展,但它们通常面临参数过多的问题。另一个问题是,大多数基于Transformer的IR方法仅关注局部或全局特征,导致感受野受限或参数不足。为解决这些问题,我们提出了一种轻量级IR网络——互惠注意力混合Transformer(RAMiT)。该网络采用我们提出的维度互惠注意力混合Transformer(D-RAMiT)模块,这些模块以不同数量的多头并行计算双维度(空间和通道)自注意力。双维度注意力相互辅助以弥补彼此的缺陷,随后进行混合。此外,我们引入了一种分层互惠注意力混合(H-RAMi)层,该层在保持高效分层结构的同时补偿像素级信息损失并利用语义信息。进一步地,我们重新审视并修改了MobileNet V1和V2,将高效卷积附加到我们提出的组件中。实验结果表明,RAMiT在多个轻量级IR任务(包括超分辨率、彩色去噪、灰度去噪、低光照增强和去雨)中取得了最优性能。代码已开源至https://github.com/rami0205/RAMiT。