More and more evidence has shown that strengthening layer interactions can enhance the representation power of a deep neural network, while self-attention excels at learning interdependencies by retrieving query-activated information. Motivated by this, we devise a cross-layer attention mechanism, called multi-head recurrent layer attention (MRLA), that sends a query representation of the current layer to all previous layers to retrieve query-related information from different levels of receptive fields. A light-weighted version of MRLA is also proposed to reduce the quadratic computation cost. The proposed layer attention mechanism can enrich the representation power of many state-of-the-art vision networks, including CNNs and vision transformers. Its effectiveness has been extensively evaluated in image classification, object detection and instance segmentation tasks, where improvements can be consistently observed. For example, our MRLA can improve 1.6% Top-1 accuracy on ResNet-50, while only introducing 0.16M parameters and 0.07B FLOPs. Surprisingly, it can boost the performances by a large margin of 3-4% box AP and mask AP in dense prediction tasks. Our code is available at https://github.com/joyfang1106/MRLA.
翻译:越来越多的证据表明,加强层间交互能够增强深度神经网络的表征能力,而自注意力机制通过检索查询激活的信息,在学习依赖关系方面表现出色。受此启发,我们设计了一种跨层注意力机制——多头循环层注意力(MRLA),该机制将当前层的查询表示发送至所有先前层,从不同感受野层级中检索与查询相关的信息。同时,我们提出了MRLA的轻量化版本以降低二次计算复杂度。所提出的层注意力机制能够增强包括卷积神经网络与视觉Transformer在内的多种先进视觉网络的表征能力。我们通过图像分类、目标检测和实例分割任务全面验证了其有效性,实验结果显示性能持续提升。例如,我们的MRLA在ResNet-50上仅增加0.16M参数和0.07B FLOPs即可提升1.6%的Top-1准确率。令人惊讶的是,在密集预测任务中,该机制能够使边界框AP和掩膜AP分别大幅提升3-4%。相关代码已开源至https://github.com/joyfang1106/MRLA。