Transformers have demonstrated their effectiveness in image restoration tasks. Existing Transformer architectures typically comprise two essential components: multi-head self-attention and feed-forward network (FFN). The former captures long-range pixel dependencies, while the latter enables the model to learn complex patterns and relationships in the data. Previous studies have demonstrated that FFNs are key-value memories \cite{geva2020transformer}, which are vital in modern Transformer architectures. In this paper, we conduct an empirical study to explore the potential of attention mechanisms without using FFN and provide novel structures to demonstrate that removing FFN is flexible for image restoration. Specifically, we propose Continuous Scaling Attention (\textbf{CSAttn}), a method that computes attention continuously in three stages without using FFN. To achieve competitive performance, we propose a series of key components within the attention. Our designs provide a closer look at the attention mechanism and reveal that some simple operations can significantly affect the model performance. We apply our \textbf{CSAttn} to several image restoration tasks and show that our model can outperform CNN-based and Transformer-based image restoration approaches.
翻译:Transformer已在图像修复任务中展现出有效性。现有Transformer架构通常包含两个核心组件:多头自注意力机制和前馈网络(FFN)。前者捕捉长距离像素依赖关系,后者使模型能够学习数据中的复杂模式与关联。已有研究表明,FFN本质上是键值记忆库\cite{geva2020transformer},在现代Transformer架构中至关重要。本文通过实证研究探索不依赖FFN的注意力机制潜力,并提出新型结构证明移除FFN对图像修复具有灵活性。具体而言,我们提出连续缩放注意力(\textbf{CSAttn})方法,该方法在不使用FFN的情况下分三个阶段连续计算注意力。为获得竞争性性能,我们在注意力机制内设计了一系列关键组件。这些设计使我们能够深入审视注意力机制,并揭示某些简单操作可显著影响模型性能。我们将\textbf{CSAttn}应用于多项图像修复任务,结果表明我们的模型能超越基于CNN和Transformer的图像修复方法。