The recently developed transformer networks have achieved impressive performance in image denoising by exploiting the self-attention (SA) in images. However, the existing methods mostly use a relatively small window to compute SA due to the quadratic complexity of it, which limits the model's ability to model long-term image information. In this paper, we propose the spatial-frequency attention network (SFANet) to enhance the network's ability in exploiting long-range dependency. For spatial attention module (SAM), we adopt dilated SA to model long-range dependency. In the frequency attention module (FAM), we exploit more global information by using Fast Fourier Transform (FFT) by designing a window-based frequency channel attention (WFCA) block to effectively model deep frequency features and their dependencies. To make our module applicable to images of different sizes and keep the model consistency between training and inference, we apply window-based FFT with a set of fixed window sizes. In addition, channel attention is computed on both real and imaginary parts of the Fourier spectrum, which further improves restoration performance. The proposed WFCA block can effectively model image long-range dependency with acceptable complexity. Experiments on multiple denoising benchmarks demonstrate the leading performance of SFANet network.
翻译:近期发展的Transformer网络通过利用图像中的自注意力机制在图像去噪中取得了显著性能。然而,现有方法大多采用相对较小的窗口计算自注意力(因其具有二次复杂度),这限制了模型对长程图像信息的建模能力。本文提出空间-频率注意力网络(SFANet)以增强模型利用长程依赖的能力。在空间注意力模块(SAM)中,我们采用扩张自注意力建模长程依赖关系;在频率注意力模块(FAM)中,通过设计基于窗口的频率通道注意力(WFCA)块,利用快速傅里叶变换(FFT)挖掘更多全局信息,从而有效建模深层频率特征及其依赖关系。为使模块适用于不同尺寸图像并保持训练与推理的模型一致性,我们采用固定窗口尺寸的窗口式FFT。此外,在傅里叶频谱的实部和虚部上均计算通道注意力,进一步提升了复原性能。所提出的WFCA块能以可接受的复杂度有效建模图像长程依赖。在多组基准去噪实验上的结果表明,SFANet网络具有领先性能。