The attention score matrix ${\rm SoftMax}(XY^T)$ encodes relational similarity patterns between objects and is extremely popular in machine learning. However, the complexity required to calculate it runs quadratically with the problem size, making it a computationally heavy solution. In this article, we propose a linear-time approximation of the attention score normalization constants for embedding vectors with bounded norms. We show on several pre-trained embeddings that the accuracy of our estimation formula surpasses competing kernel methods by even orders of magnitude. From this result, we design a linear-time and task-agnostic embedding algorithm based on the optimization of the attention scores. The proposed algorithm is highly interpretable and easily adapted to an arbitrary embedding problem. We consider a few use-cases and observe similar or higher performances and a lower computational time with respect to comparable embedding algorithms.
翻译:注意力分数矩阵${\rm SoftMax}(XY^T)$编码了对象间的关系相似性模式,在机器学习中应用极为广泛。然而,计算该矩阵所需的复杂度随问题规模呈二次方增长,使其成为计算开销较大的解决方案。本文针对范数有界的嵌入向量,提出了注意力分数归一化常数的线性时间近似方法。我们在多个预训练嵌入上证明,所提估计公式的精度甚至超越现有核方法数个数量级。基于这一结果,我们设计了一种基于注意力分数优化的线性时间且与任务无关的嵌入算法。该算法具有高度可解释性,并能轻松适配任意嵌入问题。通过若干用例验证,相较于同类嵌入算法,本算法在保持相当或更高性能的同时,显著降低了计算时间。