Although many recent works have made advancements in the image restoration (IR) field, they often suffer from an excessive number of parameters. Another issue is that most Transformer-based IR methods focus only on either local or global features, leading to limited receptive fields or deficient parameter issues. To address these problems, we propose a lightweight IR network, Reciprocal Attention Mixing Transformer (RAMiT). It employs our proposed dimensional reciprocal attention mixing Transformer (D-RAMiT) blocks, which compute bi-dimensional (spatial and channel) self-attentions in parallel with different numbers of multi-heads. The bi-dimensional attentions help each other to complement their counterpart's drawbacks and are then mixed. Additionally, we introduce a hierarchical reciprocal attention mixing (H-RAMi) layer that compensates for pixel-level information losses and utilizes semantic information while maintaining an efficient hierarchical structure. Furthermore, we revisit and modify MobileNet V1 and V2 to attach efficient convolutions to our proposed components. The experimental results demonstrate that RAMiT achieves state-of-the-art performance on multiple lightweight IR tasks, including super-resolution, color denoising, grayscale denoising, low-light enhancement, and deraining. Codes will be available soon.
翻译:尽管近期许多工作在图像复原领域取得了进展,但往往存在参数过多的问题。另一个问题是,大多数基于Transformer的IR方法仅关注局部或全局特征,导致感受野受限或参数不足。为解决这些问题,我们提出了一种轻量级IR网络——互注意力混合Transformer(RAMiT)。该网络采用所提出的维度互注意力混合Transformer(D-RAMiT)模块,通过不同数量的多头并行计算双维度(空间和通道)自注意力。双维度注意力相互补充对方缺陷后完成混合。此外,我们引入了分层互注意力混合(H-RAMi)层,在保持高效分层结构的同时补偿像素级信息损失并利用语义信息。进一步,我们重新审视并改进MobileNet V1和V2,将高效卷积附加至所提出的组件中。实验结果表明,RAMiT在包括超分辨率、彩色去噪、灰度去噪、低光照增强和去雨在内的多项轻量级IR任务上达到了最先进性能。代码即将开源。