We consider function spaces defined by self-attention networks without normalization, and theoretically analyze their geometry. Since these networks are polynomial, we rely on tools from algebraic geometry. In particular, we study the identifiability of deep attention by providing a description of the generic fibers of the parametrization for an arbitrary number of layers and, as a consequence, compute the dimension of the function space. Additionally, for a single-layer model, we characterize the singular and boundary points. Finally, we formulate a conjectural extension of our results to normalized self-attention networks, prove it for a single layer, and numerically verify it in the deep case.
翻译:我们考虑无归一化自注意力网络定义的空间,并对其几何结构进行理论分析。由于这类网络具有多项式结构,我们借助代数几何工具进行研究。具体而言,通过刻画任意层数参数化网络的通用纤维结构,我们研究了深度注意力的可辨识性,并据此计算了函数空间的维度。此外,针对单层模型,我们刻画了奇点与边界点。最后,我们对归一化自注意力网络提出了一种猜想性的结论推广,在单层情形下给出了严格证明,并在深度情形下通过数值实验进行了验证。