We present an Encoder-Decoder Attention Transformer, EDAFormer, which consists of the Embedding-Free Transformer (EFT) encoder and the all-attention decoder leveraging our Embedding-Free Attention (EFA) structure. The proposed EFA is a novel global context modeling mechanism that focuses on functioning the global non-linearity, not the specific roles of the query, key and value. For the decoder, we explore the optimized structure for considering the globality, which can improve the semantic segmentation performance. In addition, we propose a novel Inference Spatial Reduction (ISR) method for the computational efficiency. Different from the previous spatial reduction attention methods, our ISR method further reduces the key-value resolution at the inference phase, which can mitigate the computation-performance trade-off gap for the efficient semantic segmentation. Our EDAFormer shows the state-of-the-art performance with the efficient computation compared to the existing transformer-based semantic segmentation models in three public benchmarks, including ADE20K, Cityscapes and COCO-Stuff. Furthermore, our ISR method reduces the computational cost by up to 61% with minimal mIoU performance degradation on Cityscapes dataset. The code is available at https://github.com/hyunwoo137/EDAFormer.
翻译:我们提出了一种编码器-解码器注意力Transformer模型EDAFormer,该模型由嵌入自由Transformer(EFT)编码器和采用我们提出的嵌入自由注意力(EFA)结构的全注意力解码器构成。所提出的EFA是一种新颖的全局上下文建模机制,其核心在于实现全局非线性功能,而非关注查询、键和值的特定角色。对于解码器,我们探索了考虑全局性的优化结构,这能够提升语义分割性能。此外,我们提出了一种新颖的推理空间缩减(ISR)方法以提高计算效率。与先前的空间缩减注意力方法不同,我们的ISR方法在推理阶段进一步降低了键-值分辨率,从而能够缓解高效语义分割中计算与性能的权衡差距。在三个公共基准数据集(包括ADE20K、Cityscapes和COCO-Stuff)上,与现有的基于Transformer的语义分割模型相比,我们的EDAFormer在保持高效计算的同时展现了最先进的性能。此外,在Cityscapes数据集上,我们的ISR方法在仅带来最小mIoU性能损失的情况下,将计算成本降低了高达61%。代码可在https://github.com/hyunwoo137/EDAFormer获取。