We design controlled experiments to study HOW generative language models, like GPT, learn context-free grammars (CFGs) -- diverse language systems with a tree-like structure capturing many aspects of natural languages, programs, and logics. CFGs are as hard as pushdown automata, and can be ambiguous so that verifying if a string satisfies the rules requires dynamic programming. We construct synthetic data and demonstrate that even for difficult (long and ambiguous) CFGs, pre-trained transformers can learn to generate sentences with near-perfect accuracy and impressive diversity. More importantly, we delve into the physical principles behind how transformers learns CFGs. We discover that the hidden states within the transformer implicitly and precisely encode the CFG structure (such as putting tree node information exactly on the subtree boundary), and learn to form "boundary to boundary" attentions resembling dynamic programming. We also cover some extension of CFGs as well as the robustness aspect of transformers against grammar mistakes. Overall, our research provides a comprehensive and empirical understanding of how transformers learn CFGs, and reveals the physical mechanisms utilized by transformers to capture the structure and rules of languages.
翻译:我们设计控制实验,研究生成式语言模型(如GPT)如何学习上下文无关文法(CFG)——这类具有树状结构、能捕捉自然语言、程序及逻辑中诸多特性的多样化语言系统。CFG的复杂度与下推自动机相当,且可能具有歧义性,验证字符串是否符合规则需要动态规划。我们构建合成数据并证明,即使对于复杂(长且歧义)的CFG,经过预训练的Transformer也能以近乎完美的准确率和令人印象深刻的多样性生成句子。更重要的是,我们深入探究了Transformer学习CFG背后的物理原理。我们发现,Transformer内部的隐藏状态隐式且精确地编码了CFG结构(例如将树节点信息精确置于子树边界上),并形成类似动态规划的"边界到边界"注意力模式。我们还涵盖了CFG的部分扩展,以及Transformer对抗语法错误的鲁棒性。总体而言,我们的研究为Transformer如何学习CFG提供了全面且基于实证的理解,并揭示了Transformer用于捕捉语言结构与规则的物理机制。