Language models learn rare syntactic phenomena, but it has been argued that they rely on rote memorization, as opposed to grammatical generalization. Training on a corpus of human-scale in size (100M words), we iteratively trained transformer language models on systematically manipulated corpora and then evaluated their learning of a particular rare grammatical phenomenon: the English Article+Adjective+Numeral+Noun (AANN) construction (``a beautiful five days''). We first compared how well this construction was learned on the default corpus relative to a counterfactual corpus in which the AANN sentences were removed. AANNs were still learned better than systematically perturbed variants of the construction. Using additional counterfactual corpora, we suggest that this learning occurs through generalization from related constructions (e.g., ``a few days''). An additional experiment showed that this learning is enhanced when there is more variability in the input. Taken together, our results provide an existence proof that models learn rare grammatical phenomena by generalization from less rare phenomena. Code available at https://github.com/kanishkamisra/aannalysis
翻译:语言模型能够学习罕见的句法现象,但已有研究认为它们依赖于死记硬背而非语法泛化。我们在人类规模(1亿词)的语料库上,通过系统操控语料迭代训练Transformer语言模型,并评估其对特定罕见语法现象——英语冠词+形容词+数词+名词(AANN)结构(如"a beautiful five days")的学习效果。首先比较了默认语料与移除AANN句子的反事实语料中该结构的学习表现。结果表明,即使删除所有AANN句子,模型仍能比系统扰乱的变体结构更好地学习该现象。通过更多反事实语料实验,我们提出这种学习源自相关结构(如"a few days")的泛化。额外实验表明,当输入具有更高变异性时,该学习效果得到增强。综合而言,我们的结果证明了模型通过从较不罕见现象中泛化来学习罕见语法现象。代码参见https://github.com/kanishkamisra/aannalysis