Currently, Transformer is the most popular architecture for image dehazing, but due to its large computational complexity, its ability to handle long-range dependency is limited on resource-constrained devices. To tackle this challenge, we introduce the U-shaped Vision Mamba (UVM-Net), an efficient single-image dehazing network. Inspired by the State Space Sequence Models (SSMs), a new deep sequence model known for its power to handle long sequences, we design a Bi-SSM block that integrates the local feature extraction ability of the convolutional layer with the ability of the SSM to capture long-range dependencies. Extensive experimental results demonstrate the effectiveness of our method. Our method provides a more highly efficient idea of long-range dependency modeling for image dehazing as well as other image restoration tasks. The URL of the code is \url{https://github.com/zzr-idam/UVM-Net}. Our method takes only \textbf{0.009} seconds to infer a $325 \times 325$ resolution image (100FPS) without I/O handling time.
翻译:目前,Transformer是图像去雾领域最流行的架构,但由于其计算复杂度较高,在资源受限设备上处理长程依赖的能力受限。为应对这一挑战,我们引入U形视觉曼巴(UVM-Net),一种高效的单幅图像去雾网络。受状态空间序列模型(SSMs,一种以处理长序列能力著称的新型深度序列模型)启发,我们设计了Bi-SSM模块,该模块融合了卷积层局部特征提取能力与SSM捕获长程依赖的能力。大量实验结果证明了我们方法的有效性。该方法为图像去雾及其他图像恢复任务提供了一种更高效的长程依赖建模思路。代码网址为:\url{https://github.com/zzr-idam/UVM-Net}。在不计I/O处理时间的情况下,我们的方法仅需\textbf{0.009}秒即可推理一张$325 \times 325$分辨率的图像(100FPS)。