Human language is full of compositional syntactic structures, and although neural networks have contributed to groundbreaking improvements in computer systems that process language, widely-used neural network architectures still exhibit limitations in their ability to process syntax. To address this issue, prior work has proposed adding stack data structures to neural networks, drawing inspiration from theoretical connections between syntax and stacks. However, these methods employ deterministic stacks that are designed to track one parse at a time, whereas syntactic ambiguity, which requires a nondeterministic stack to parse, is extremely common in language. In this dissertation, we remedy this discrepancy by proposing a method of incorporating nondeterministic stacks into neural networks. We develop a differentiable data structure that efficiently simulates a nondeterministic pushdown automaton, representing an exponential number of computations with a dynamic programming algorithm. We incorporate this module into two predominant architectures: recurrent neural networks (RNNs) and transformers. We show that this raises their formal recognition power to arbitrary context-free languages, and also aids training, even on deterministic context-free languages. Empirically, neural networks with nondeterministic stacks learn context-free languages much more effectively than prior stack-augmented models, including a language with theoretically maximal parsing difficulty. We also show that an RNN augmented with a nondeterminsitic stack is capable of surprisingly powerful behavior, such as learning cross-serial dependencies, a well-known non-context-free pattern. We demonstrate improvements on natural language modeling and provide analysis on a syntactic generalization benchmark. This work represents an important step toward building systems that learn to use syntax in more human-like fashion.
翻译:人类语言充满组成性句法结构,尽管神经网络在语言处理的计算机系统领域取得了突破性进展,但广泛使用的神经网络架构在处理句法方面仍存在局限性。为解决这一问题,先前研究受句法与堆栈之间理论关联的启发,提出向神经网络添加堆栈数据结构。然而,这些方法采用确定性堆栈,其设计目标是每次跟踪一种解析,而句法歧义(需借助非确定性堆栈进行解析)在语言中极为常见。本文通过提出一种将非确定性堆栈融入神经网络的方法来弥补这一差异。我们开发了一种可微分数据结构,能高效模拟非确定性下推自动机,并通过动态规划算法表示指数级数量的计算。将该模块集成至两种主流架构——循环神经网络和Transformer中,我们证明此举不仅将其形式识别能力提升至任意上下文无关语言,还有助于训练(即使在确定性上下文无关语言上亦如此)。实验表明,配备非确定性堆栈的神经网络在学习上下文无关语言方面显著优于先前堆栈增强模型,包括具有理论上最大解析难度的语言。我们还发现,增强非确定性堆栈的循环神经网络能展现出惊人的强大行为,例如学习交叉序列依赖(一种著名的非上下文无关模式)。我们在自然语言建模上展示了改进效果,并提供了句法泛化基准的分析。这项工作向构建更类人化句法学习系统迈出了重要一步。