Self-Attention Networks Can Process Bounded Hierarchical Languages

from arxiv, ACL 2021. 19 pages with extended appendix. v2 fixed a small typo in the formula at the end of page 5 (thank to Gabriel Faria). Code: https://github.com/princeton-nlp/dyck-transformer

Despite their impressive performance in NLP, self-attention networks were recently proved to be limited for processing formal languages with hierarchical structure, such as $\mathsf{Dyck}_k$, the language consisting of well-nested parentheses of $k$ types. This suggested that natural language can be approximated well with models that are too weak for formal languages, or that the role of hierarchy and recursion in natural language might be limited. We qualify this implication by proving that self-attention networks can process $\mathsf{Dyck}_{k, D}$, the subset of $\mathsf{Dyck}_{k}$ with depth bounded by $D$, which arguably better captures the bounded hierarchical structure of natural language. Specifically, we construct a hard-attention network with $D+1$ layers and $O(\log k)$ memory size (per token per layer) that recognizes $\mathsf{Dyck}_{k, D}$, and a soft-attention network with two layers and $O(\log k)$ memory size that generates $\mathsf{Dyck}_{k, D}$. Experiments show that self-attention networks trained on $\mathsf{Dyck}_{k, D}$ generalize to longer inputs with near-perfect accuracy, and also verify the theoretical memory advantage of self-attention networks over recurrent networks.

翻译：尽管自注意力网络在自然语言处理中表现卓越，但近期研究表明，其在处理具有层级结构的正式语言（如由$k$种嵌套括号构成的规范语言$\mathsf{Dyck}_k$）时存在局限性。这一发现暗示自然语言可能被弱于处理形式语言的模型所近似，或层级与递归在自然语言中的作用可能有限。我们对此推论进行限定性分析，证明自注意力网络能够处理$\mathsf{Dyck}_k$的子集$\mathsf{Dyck}_{k, D}$（即深度受$D$约束的括号语言），该子集更准确地刻画了自然语言中有界的层级结构。具体而言，我们构建了一个包含$D+1$层、每层每标记内存为$O(\log k)$的硬注意力网络，可识别$\mathsf{Dyck}_{k, D}$；以及一个包含两层、每层每标记内存为$O(\log k)$的软注意力网络，可生成$\mathsf{Dyck}_{k, D}$。实验表明，在$\mathsf{Dyck}_{k, D}$上训练的自注意力网络能以近完美精度泛化至更长的输入序列，并验证了自注意力网络相较循环网络在理论内存上的优势。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日