Natural languages are believed to be (mildly) context-sensitive. Despite underpinning remarkably capable large language models, transformers are unable to model many context-free language tasks. In an attempt to address this limitation in the modeling power of transformer-based language models, we propose augmenting them with a differentiable, stack-based attention mechanism. Our stack-based attention mechanism can be incorporated into any transformer-based language model and adds a level of interpretability to the model. We show that the addition of our stack-based attention mechanism enables the transformer to model some, but not all, deterministic context-free languages.
翻译:自然语言被认为具有(轻度)上下文敏感性。尽管Transformer支撑了极其强大的大型语言模型,但它却无法建模许多上下文无关语言任务。为尝试克服基于Transformer的语言模型在建模能力上的这一局限,我们提出为其增加一种可微分的、基于堆栈的注意力机制。这种基于堆栈的注意力机制可被集成到任何基于Transformer的语言模型中,并能为模型增加一定程度的可解释性。我们证明,增加这种基于堆栈的注意力机制使Transformer能够建模部分(而非全部)确定性上下文无关语言。