Recent work by Hewitt et al. (2020) provides an interpretation of the empirical success of recurrent neural networks (RNNs) as language models (LMs). It shows that RNNs can efficiently represent bounded hierarchical structures that are prevalent in human language. This suggests that RNNs' success might be linked to their ability to model hierarchy. However, a closer inspection of Hewitt et al.'s (2020) construction shows that it is not inherently limited to hierarchical structures. This poses a natural question: What other classes of LMs can RNNs efficiently represent? To this end, we generalize Hewitt et al.'s (2020) construction and show that RNNs can efficiently represent a larger class of LMs than previously claimed -- specifically, those that can be represented by a pushdown automaton with a bounded stack and a specific stack update function. Altogether, the efficiency of representing this diverse class of LMs with RNN LMs suggests novel interpretations of their inductive bias.
翻译:Hewitt等人(2020)的最新研究为循环神经网络(RNN)作为语言模型(LM)所取得的经验性成功提供了一种解释。该研究表明,RNN能够高效地表示人类语言中普遍存在的有限层次结构。这暗示RNN的成功可能与其建模层次结构的能力有关。然而,仔细审视Hewitt等人(2020)的构造方法可以发现,该方法本质上并不局限于层次结构。这自然引出一个问题:RNN还能高效表示哪些其他类别的语言模型?为此,我们推广了Hewitt等人(2020)的构造方法,证明RNN能够高效表示的LM类别比先前声称的更广泛——具体而言,即那些可由具有有限堆栈和特定堆栈更新函数的下推自动机表示的LM。总之,用RNN语言模型高效表示这一多样化LM类别的能力,为其归纳偏倚提供了新的解释视角。