Recent years have seen significant advancements in image restoration, largely attributed to the development of modern deep neural networks, such as CNNs and Transformers. However, existing restoration backbones often face the dilemma between global receptive fields and efficient computation, hindering their application in practice. Recently, the Selective Structured State Space Model, especially the improved version Mamba, has shown great potential for long-range dependency modeling with linear complexity, which offers a way to resolve the above dilemma. However, the standard Mamba still faces certain challenges in low-level vision such as local pixel forgetting and channel redundancy. In this work, we introduce a simple but effective baseline, named MambaIR, which introduces both local enhancement and channel attention to improve the vanilla Mamba. In this way, our MambaIR takes advantage of the local pixel similarity and reduces the channel redundancy. Extensive experiments demonstrate the superiority of our method, for example, MambaIR outperforms SwinIR by up to 0.45dB on image SR, using similar computational cost but with a global receptive field. Code is available at \url{https://github.com/csguoh/MambaIR}.
翻译:近年来,图像复原领域取得了显著进展,这主要归功于CNN和Transformer等现代深度神经网络的发展。然而,现有复原主干网络常面临全局感受野与高效计算之间的权衡困境,制约了其实际应用。近期,选择性结构化状态空间模型(尤其是改进版本Mamba)展现出以线性复杂度建模长程依赖关系的巨大潜力,为破解上述困境提供了新思路。但标准Mamba在底层视觉任务中仍面临局部像素遗忘和通道冗余等挑战。为此,本文提出名为MambaIR的简单有效基线,通过引入局部增强和通道注意力机制改进原始Mamba。该方法利用局部像素相似性优势,有效降低了通道冗余。大量实验证明本方法的优越性:以图像超分任务为例,在计算量相近且具备全局感受野的条件下,MambaIR较SwinIR提升达0.45dB。代码已开源至\url{https://github.com/csguoh/MambaIR}。