We consider function spaces defined by self-attention networks without normalization, and theoretically analyze their geometry. Since these networks are polynomial, we rely on tools from algebraic geometry. In particular, we study the identifiability of deep attention by providing a description of the generic fibers of the parametrization for an arbitrary number of layers and, as a consequence, compute the dimension of the function space. Additionally, for a single-layer model, we characterize the singular and boundary points. Finally, we formulate a conjectural extension of our results to normalized self-attention networks, prove it for a single layer, and numerically verify it in the deep case.
翻译:我们研究由无归一化的自注意力网络定义的函数空间,并从理论上分析其几何特性。由于这些网络是多项式形式的,我们依赖于代数几何的工具。具体而言,我们通过描述任意层数参数化的一般纤维来研究深度注意力的可辨识性,并由此计算函数空间的维度。此外,对于单层模型,我们刻画了奇异点和边界点。最后,我们将结果推广到归一化自注意力网络,提出一个猜想性扩展,对单层情况给出了证明,并在深层情况下进行了数值验证。