When acquiring syntax, children consistently choose hierarchical rules over competing non-hierarchical possibilities. Is this preference due to a learning bias for hierarchical structure, or due to more general biases that interact with hierarchical cues in children's linguistic input? We explore these possibilities by training LSTMs and Transformers - two types of neural networks without a hierarchical bias - on data similar in quantity and content to children's linguistic input: text from the CHILDES corpus. We then evaluate what these models have learned about English yes/no questions, a phenomenon for which hierarchical structure is crucial. We find that, though they perform well at capturing the surface statistics of child-directed speech (as measured by perplexity), both model types generalize in a way more consistent with an incorrect linear rule than the correct hierarchical rule. These results suggest that human-like generalization from text alone requires stronger biases than the general sequence-processing biases of standard neural network architectures.
翻译:在习得句法过程中,儿童始终倾向于选择层级规则而非竞争性的非层级可能性。这种偏好究竟源于其对层级结构的学习偏向,还是源于更普遍的偏向与儿童语言输入中的层级线索的交互作用?为探究这些可能性,我们采用与儿童语言输入在数量及内容上相似的CHILDES语料库文本,训练了两种不具备层级偏向的神经网络——LSTM与Transformer模型,并评估这些模型对英语是非疑问句(一种关键依赖层级结构的语言现象)的学习效果。研究发现,尽管两类模型在捕捉儿童导向语言的表面统计特征(以困惑度衡量)方面表现优异,但其泛化方式更符合错误的线性规则而非正确的层级规则。结果表明,仅凭文本数据实现类人泛化能力,需要比标准神经网络架构中通用序列处理偏向更强的归纳偏置。