In recent years, Transformer networks are beginning to replace pure convolutional neural networks (CNNs) in the field of computer vision due to their global receptive field and adaptability to input. However, the quadratic computational complexity of softmax-attention limits the wide application in image dehazing task, especially for high-resolution images. To address this issue, we propose a new Transformer variant, which applies the Taylor expansion to approximate the softmax-attention and achieves linear computational complexity. A multi-scale attention refinement module is proposed as a complement to correct the error of the Taylor expansion. Furthermore, we introduce a multi-branch architecture with multi-scale patch embedding to the proposed Transformer, which embeds features by overlapping deformable convolution of different scales. The design of multi-scale patch embedding is based on three key ideas: 1) various sizes of the receptive field; 2) multi-level semantic information; 3) flexible shapes of the receptive field. Our model, named Multi-branch Transformer expanded by Taylor formula (MB-TaylorFormer), can embed coarse to fine features more flexibly at the patch embedding stage and capture long-distance pixel interactions with limited computational cost. Experimental results on several dehazing benchmarks show that MB-TaylorFormer achieves state-of-the-art (SOTA) performance with a light computational burden. The source code and pre-trained models are available at https://github.com/FVL2020/ICCV-2023-MB-TaylorFormer.
翻译:近年来,Transformer网络因其全局感受野和输入适应性,开始取代纯卷积神经网络(CNN)在计算机视觉领域的应用。然而,softmax注意力机制的二次计算复杂度限制了其在图像去雾任务中的广泛应用,尤其是高分辨率图像场景。为解决这一问题,我们提出一种新型Transformer变体,通过泰勒展开近似softmax注意力机制,实现线性计算复杂度。同时,设计多尺度注意力精炼模块作为补充,用于校正泰勒展开的误差。此外,我们在所提Transformer中引入具有多尺度块嵌入的多分支架构,通过不同尺度的重叠可变形卷积对特征进行嵌入。该多尺度块嵌入的设计基于三个关键思想:1)感受野的多种尺寸;2)多层级语义信息;3)感受野的灵活形状。我们的模型命名为基于泰勒公式扩展的多分支Transformer(MB-TaylorFormer),能够在块嵌入阶段更灵活地嵌入从粗到细的特征,并以有限的计算成本捕获长距离像素交互。在多个去雾基准数据集上的实验表明,MB-TaylorFormer在保持轻量计算负担的同时实现了最先进的性能。源代码和预训练模型可在https://github.com/FVL2020/ICCV-2023-MB-TaylorFormer 获取。