Deep state-space models (SSMs), like recent Mamba architectures, are emerging as a promising alternative to CNN and Transformer networks. Existing Mamba-based restoration methods process the visual data by leveraging a flatten-and-scan strategy that converts image patches into a 1D sequence before scanning. However, this scanning paradigm ignores local pixel dependencies and introduces spatial misalignment by positioning distant pixels incorrectly adjacent, which reduces local noise-awareness and degrades image sharpness in low-level vision tasks. To overcome these issues, we propose a novel slice-and-scan strategy that alternates scanning along intra- and inter-slices. We further design a new Vision State Space Module (VSSM) for image deblurring, and tackle the inefficiency challenges of the current Mamba-based vision module. Building upon this, we develop XYScanNet, an SSM architecture integrated with a lightweight feature fusion module for enhanced image deblurring. XYScanNet, maintains competitive distortion metrics and significantly improves perceptual performance. Experimental results show that XYScanNet enhances KID by $17\%$ compared to the nearest competitor. Our code will be released soon.
翻译:深度状态空间模型(SSMs),如近期的Mamba架构,正逐渐成为CNN和Transformer网络的有力替代方案。现有的基于Mamba的复原方法采用一种展平扫描策略,将图像块转换为1D序列后再进行扫描。然而,这种扫描范式忽略了局部像素依赖性,并通过将远处像素错误地置于相邻位置而引入空间错位,这降低了局部噪声感知能力,并在低级视觉任务中损害了图像清晰度。为克服这些问题,我们提出了一种新颖的切片扫描策略,该策略在切片内和切片间交替进行扫描。我们进一步设计了一种用于图像去模糊的新视觉状态空间模块(VSSM),并解决了当前基于Mamba的视觉模块的效率挑战。在此基础上,我们开发了XYScanNet,这是一种集成了轻量级特征融合模块的SSM架构,用于增强图像去模糊效果。XYScanNet在保持竞争力的失真度量同时,显著提升了感知性能。实验结果表明,与最接近的竞争者相比,XYScanNet将KID指标提升了$17\%$。我们的代码即将发布。