Rain-by-snow weather removal is a specialized task in weather-degraded image restoration aiming to eliminate coexisting rain streaks and snow particles. In this paper, we propose RSFormer, an efficient and effective Transformer that addresses this challenge. Initially, we explore the proximity of convolution networks (ConvNets) and vision Transformers (ViTs) in hierarchical architectures and experimentally find they perform approximately at intra-stage feature learning. On this basis, we utilize a Transformer-like convolution block (TCB) that replaces the computationally expensive self-attention while preserving attention characteristics for adapting to input content. We also demonstrate that cross-stage progression is critical for performance improvement, and propose a global-local self-attention sampling mechanism (GLASM) that down-/up-samples features while capturing both global and local dependencies. Finally, we synthesize two novel rain-by-snow datasets, RSCityScape and RS100K, to evaluate our proposed RSFormer. Extensive experiments verify that RSFormer achieves the best trade-off between performance and time-consumption compared to other restoration methods. For instance, it outperforms Restormer with a 1.53% reduction in the number of parameters and a 15.6% reduction in inference time. Datasets, source code and pre-trained models are available at \url{https://github.com/chdwyb/RSFormer}.
翻译:雨夹雪天气去除是天气退化图像恢复中的一项专门任务,旨在消除共存的雨雪条纹和雪颗粒。本文提出RSFormer,一种高效有效的Transformer以应对这一挑战。首先,我们探究了卷积网络(ConvNets)与视觉Transformer(ViTs)在分层架构中的邻近性,并通过实验发现它们在阶段内特征学习上表现相近。基于此,我们利用一种类Transformer卷积块(TCB),该模块替代了计算昂贵的自注意力机制,同时保留了对输入内容的适应性注意力特性。我们还证明了跨阶段演进对性能提升至关重要,并提出了一种全局-局部自注意力采样机制(GLASM),该机制在特征下采样/上采样过程中同时捕获全局和局部依赖关系。最后,我们合成了两个新的雨夹雪数据集RSCityScape和RS100K,以评估所提出的RSFormer。大量实验验证,与其他恢复方法相比,RSFormer在性能与时间消耗之间实现了最佳权衡。例如,它在参数量减少1.53%、推理时间减少15.6%的情况下性能优于Restormer。数据集、源代码及预训练模型可在\url{https://github.com/chdwyb/RSFormer}获取。