We propose an efficient interactive method for multi-head self-attention via decomposition. For existing methods using multi-head self-attention, the attention operation of each head is computed independently. However, we show that the interactions between cross-heads of the attention matrix enhance the information flow of the attention operation. Considering that the attention matrix of each head can be seen as a feature of networks, it is beneficial to establish connectivity between them to capture interactions better. However, a straightforward approach to capture the interactions between the cross-heads is computationally prohibitive as the complexity grows substantially with the high dimension of an attention matrix. In this work, we propose an effective method to decompose the attention operation into query- and key-less components. This will result in a more manageable size for the attention matrix, specifically for the cross-head interactions. Expensive experimental results show that the proposed cross-head interaction approach performs favorably against existing efficient attention methods and state-of-the-art backbone models.
翻译:我们提出一种通过分解实现多头自注意力高效交互的方法。现有基于多头自注意力的方法中,每个注意力头独立执行注意力运算。然而,我们证明注意力矩阵中跨头之间的交互能增强注意力运算的信息流动。考虑到每个注意力头的矩阵可视为网络特征,建立它们之间的连接以更好地捕获交互十分有益。然而,直接捕获跨头交互的方法因注意力矩阵高维特性导致计算复杂度急剧增长,难以实际应用。本研究提出一种有效方法,将注意力运算分解为无需查询和键的组件,从而显著降低注意力矩阵(尤其是跨头交互部分)的尺寸。大量实验结果表明,所提出的跨头交互方法在性能上优于现有高效注意力方法与最先进的骨干模型。