The outdoor vision systems are frequently contaminated by rain streaks and raindrops, which significantly degenerate the performance of visual tasks and multimedia applications. The nature of videos exhibits redundant temporal cues for rain removal with higher stability. Traditional video deraining methods heavily rely on optical flow estimation and kernel-based manners, which have a limited receptive field. Yet, transformer architectures, while enabling long-term dependencies, bring about a significant increase in computational complexity. Recently, the linear-complexity operator of the state space models (SSMs) has contrarily facilitated efficient long-term temporal modeling, which is crucial for rain streaks and raindrops removal in videos. Unexpectedly, its uni-dimensional sequential process on videos destroys the local correlations across the spatio-temporal dimension by distancing adjacent pixels. To address this, we present an improved SSMs-based video deraining network (RainMamba) with a novel Hilbert scanning mechanism to better capture sequence-level local information. We also introduce a difference-guided dynamic contrastive locality learning strategy to enhance the patch-level self-similarity learning ability of the proposed network. Extensive experiments on four synthesized video deraining datasets and real-world rainy videos demonstrate the superiority of our network in the removal of rain streaks and raindrops.
翻译:户外视觉系统常受雨线和雨滴污染,这会显著降低视觉任务和多媒体应用的性能。视频本身具有冗余的时间线索,为去雨提供了更高的稳定性。传统视频去雨方法严重依赖光流估计和基于核的方式,其感受野有限。而Transformer架构虽然能建立长期依赖,却带来了计算复杂度的大幅增加。最近,状态空间模型(SSMs)的线性复杂度算子反而促进了高效的长期时间建模,这对于视频中雨线和雨滴的去除至关重要。然而,其视频处理中的一维序列化过程通过拉远相邻像素的距离,破坏了跨时空维度的局部相关性。为解决此问题,我们提出了一种改进的基于SSMs的视频去雨网络(RainMamba),采用新颖的希尔伯特扫描机制以更好地捕获序列级局部信息。我们还引入了一种差分引导的动态对比局部性学习策略,以增强所提网络的块级自相似性学习能力。在四个合成视频去雨数据集和真实世界雨视频上的大量实验证明了我们的网络在去除雨线和雨滴方面的优越性。