Nondeterministic Stacks in Neural Networks

Human language is full of compositional syntactic structures, and although neural networks have contributed to groundbreaking improvements in computer systems that process language, widely-used neural network architectures still exhibit limitations in their ability to process syntax. To address this issue, prior work has proposed adding stack data structures to neural networks, drawing inspiration from theoretical connections between syntax and stacks. However, these methods employ deterministic stacks that are designed to track one parse at a time, whereas syntactic ambiguity, which requires a nondeterministic stack to parse, is extremely common in language. In this dissertation, we remedy this discrepancy by proposing a method of incorporating nondeterministic stacks into neural networks. We develop a differentiable data structure that efficiently simulates a nondeterministic pushdown automaton, representing an exponential number of computations with a dynamic programming algorithm. We incorporate this module into two predominant architectures: recurrent neural networks (RNNs) and transformers. We show that this raises their formal recognition power to arbitrary context-free languages, and also aids training, even on deterministic context-free languages. Empirically, neural networks with nondeterministic stacks learn context-free languages much more effectively than prior stack-augmented models, including a language with theoretically maximal parsing difficulty. We also show that an RNN augmented with a nondeterminsitic stack is capable of surprisingly powerful behavior, such as learning cross-serial dependencies, a well-known non-context-free pattern. We demonstrate improvements on natural language modeling and provide analysis on a syntactic generalization benchmark. This work represents an important step toward building systems that learn to use syntax in more human-like fashion.

翻译：人类语言充满组成性句法结构，尽管神经网络在语言处理的计算机系统领域取得了突破性进展，但广泛使用的神经网络架构在处理句法方面仍存在局限性。为解决这一问题，先前研究受句法与堆栈之间理论关联的启发，提出向神经网络添加堆栈数据结构。然而，这些方法采用确定性堆栈，其设计目标是每次跟踪一种解析，而句法歧义（需借助非确定性堆栈进行解析）在语言中极为常见。本文通过提出一种将非确定性堆栈融入神经网络的方法来弥补这一差异。我们开发了一种可微分数据结构，能高效模拟非确定性下推自动机，并通过动态规划算法表示指数级数量的计算。将该模块集成至两种主流架构——循环神经网络和Transformer中，我们证明此举不仅将其形式识别能力提升至任意上下文无关语言，还有助于训练（即使在确定性上下文无关语言上亦如此）。实验表明，配备非确定性堆栈的神经网络在学习上下文无关语言方面显著优于先前堆栈增强模型，包括具有理论上最大解析难度的语言。我们还发现，增强非确定性堆栈的循环神经网络能展现出惊人的强大行为，例如学习交叉序列依赖（一种著名的非上下文无关模式）。我们在自然语言建模上展示了改进效果，并提供了句法泛化基准的分析。这项工作向构建更类人化句法学习系统迈出了重要一步。

相关内容

Neural Networks

关注 1654

神经网络（Neural Networks）是世界上三个最古老的神经建模学会的档案期刊:国际神经网络学会(INNS)、欧洲神经网络学会(ENNS)和日本神经网络学会(JNNS)。神经网络提供了一个论坛，以发展和培育一个国际社会的学者和实践者感兴趣的所有方面的神经网络和相关方法的计算智能。神经网络欢迎高质量论文的提交，有助于全面的神经网络研究，从行为和大脑建模，学习算法，通过数学和计算分析，系统的工程和技术应用，大量使用神经网络的概念和技术。这一独特而广泛的范围促进了生物和技术研究之间的思想交流，并有助于促进对生物启发的计算智能感兴趣的跨学科社区的发展。因此，神经网络编委会代表的专家领域包括心理学，神经生物学，计算机科学，工程，数学，物理。该杂志发表文章、信件和评论以及给编辑的信件、社论、时事、软件调查和专利信息。文章发表在五个部分之一:认知科学，神经科学，学习系统，数学和计算分析、工程和应用。官网地址：http://dblp.uni-trier.de/db/journals/nn/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日