Although Convolutional Neural Networks (CNN) have made good progress in image restoration, the intrinsic equivalence and locality of convolutions still constrain further improvements in image quality. Recent vision transformer and self-attention have achieved promising results on various computer vision tasks. However, directly utilizing Transformer for image restoration is a challenging task. In this paper, we introduce an effective hybrid architecture for sand image restoration tasks, which leverages local features from CNN and long-range dependencies captured by transformer to improve the results further. We propose an efficient hybrid structure for sand dust image restoration to solve the feature inconsistency issue between Transformer and CNN. The framework complements each representation by modulating features from the CNN-based and Transformer-based branches rather than simply adding or concatenating features. Experiments demonstrate that SandFormer achieves significant performance improvements in synthetic and real dust scenes compared to previous sand image restoration methods.
翻译:尽管卷积神经网络在图像复原领域取得了显著进展,但卷积运算固有的等变性和局部性仍制约着图像质量的进一步提升。近年提出的视觉Transformer与自注意力机制已在多种计算机视觉任务中展现出优异性能,然而直接将其应用于图像复原仍具挑战性。本文针对沙尘图像复原任务,提出了一种融合CNN局部特征与Transformer长距离依赖建模优势的高效混合架构。为解决Transformer与CNN之间的特征不一致性问题,我们设计了基于门控融合的混合结构,通过调制来自CNN分支与Transformer分支的特征进行表示互补,而非简单的特征相加或拼接。实验表明,在合成沙尘场景与真实沙尘场景中,SandFormer较现有沙尘图像复原方法均实现了显著的性能提升。