Image restoration tasks traditionally rely on convolutional neural networks. However, given the local nature of the convolutional operator, they struggle to capture global information. The promise of attention mechanisms in Transformers is to circumvent this problem, but it comes at the cost of intensive computational overhead. Many recent studies in image restoration have focused on solving the challenge of balancing performance and computational cost via Transformer variants. In this paper, we present CascadedGaze Network (CGNet), an encoder-decoder architecture that employs Global Context Extractor (GCE), a novel and efficient way to capture global information for image restoration. The GCE module leverages small kernels across convolutional layers to learn global dependencies, without requiring self-attention. Extensive experimental results show that our computationally efficient approach performs competitively to a range of state-of-the-art methods on synthetic image denoising and single image deblurring tasks, and pushes the performance boundary further on the real image denoising task.
翻译:图像恢复任务传统上依赖于卷积神经网络。然而,由于卷积算子的局部特性,这类网络难以捕获全局信息。Transformer中的注意力机制有望克服这一问题,但代价是高昂的计算开销。近年来,许多图像恢复研究聚焦于通过Transformer变体平衡性能与计算成本的挑战。本文提出CascadedGaze网络(CGNet),这是一种采用全局上下文提取器(GCE)的编码器-解码器架构,能够高效、新颖地捕获图像恢复所需的全局信息。GCE模块利用跨卷积层的小型卷积核学习全局依赖关系,无需自注意力机制。大量实验结果表明,我们提出的计算高效方法在合成图像去噪和单图像去模糊任务中与多种最先进方法性能相当,并在真实图像去噪任务上进一步拓展了性能边界。