Designing an effective channel attention mechanism implores one to find a lossy-compression method allowing for optimal feature representation. Despite recent progress in the area, it remains an open problem. FcaNet, the current state-of-the-art channel attention mechanism, attempted to find such an information-rich compression using Discrete Cosine Transforms (DCTs). One drawback of FcaNet is that there is no natural choice of the DCT frequencies. To circumvent this issue, FcaNet experimented on ImageNet to find optimal frequencies. We hypothesize that the choice of frequency plays only a supporting role and the primary driving force for the effectiveness of their attention filters is the orthogonality of the DCT kernels. To test this hypothesis, we construct an attention mechanism using randomly initialized orthogonal filters. Integrating this mechanism into ResNet, we create OrthoNet. We compare OrthoNet to FcaNet (and other attention mechanisms) on Birds, MS-COCO, and Places356 and show superior performance. On the ImageNet dataset, our method competes with or surpasses the current state-of-the-art. Our results imply that an optimal choice of filter is elusive and generalization can be achieved with a sufficiently large number of orthogonal filters. We further investigate other general principles for implementing channel attention, such as its position in the network and channel groupings. Our code is publicly available at https://github.com/hady1011/OrthoNets/
翻译:设计有效的通道注意力机制需要找到一种能够实现最优特征表示的有损压缩方法。尽管该领域近期取得了一定进展,但这一问题仍未完全解决。当前最先进的通道注意力机制FcaNet尝试利用离散余弦变换(DCT)寻找此类信息丰富的压缩方法。该方法的缺陷在于DCT频率缺乏自然选择依据。为解决此问题,FcaNet在ImageNet数据集上通过实验确定了最优频率。我们推测,频率选择仅起辅助作用,而注意力滤波器有效性的主要驱动力来自于DCT核的正交性。为验证这一假说,我们利用随机初始化的正交滤波器构建了一种注意力机制,并将其集成至ResNet中,创建了OrthoNet。在Birds、MS-COCO及Places356数据集上,我们将OrthoNet与FcaNet(及其他注意力机制)进行对比,结果表明其性能更优。在ImageNet数据集上,我们的方法可与当前最先进方法相媲美甚至超越之。实验结果暗示,最优滤波器的选择难以捉摸,而足够数量的正交滤波器可实现泛化。此外,我们进一步研究了实现通道注意力的其他通用原则,例如其在网络中的位置及通道分组。我们的代码已公开于https://github.com/hady1011/OrthoNets/。