Recently, Transformer-based architecture has been introduced into single image deraining task due to its advantage in modeling non-local information. However, existing approaches tend to integrate global features based on a dense self-attention strategy since it tend to uses all similarities of the tokens between the queries and keys. In fact, this strategy leads to ignoring the most relevant information and inducing blurry effect by the irrelevant representations during the feature aggregation. To this end, this paper proposes an effective image deraining Transformer with dynamic dual self-attention (DDSA), which combines both dense and sparse attention strategies to better facilitate clear image reconstruction. Specifically, we only select the most useful similarity values based on top-k approximate calculation to achieve sparse attention. In addition, we also develop a novel spatial-enhanced feed-forward network (SEFN) to further obtain a more accurate representation for achieving high-quality derained results. Extensive experiments on benchmark datasets demonstrate the effectiveness of our proposed method.
翻译:近期,基于Transformer的架构因其在建模非局部信息方面的优势而被引入单幅图像去雨任务。然而,现有方法倾向于基于密集自注意力策略整合全局特征,因为该策略会利用查询与键之间令牌的所有相似性。实际上,这种策略会导致忽略最相关信息,并在特征聚合过程中因无关表示产生模糊效果。为此,本文提出一种有效的图像去雨Transformer网络,其中包含动态双自注意力机制(DDSA),该机制结合了密集与稀疏两种注意力策略,以更好地促进清晰图像重建。具体而言,我们仅基于top-k近似计算选择最有用的相似度值来实现稀疏注意力。此外,我们还开发了一种新颖的空间增强前馈网络(SEFN),以进一步获得更精确的表示,从而实现高质量的去雨结果。在基准数据集上的广泛实验证明了我们提出方法的有效性。