When acquiring syntax, children consistently choose hierarchical rules over competing non-hierarchical possibilities. Is this preference due to a learning bias for hierarchical structure, or due to more general biases that interact with hierarchical cues in children's linguistic input? We explore these possibilities by training LSTMs and Transformers - two types of neural networks without a hierarchical bias - on data similar in quantity and content to children's linguistic input: text from the CHILDES corpus. We then evaluate what these models have learned about English yes/no questions, a phenomenon for which hierarchical structure is crucial. We find that, though they perform well at capturing the surface statistics of child-directed speech (as measured by perplexity), both model types generalize in a way more consistent with an incorrect linear rule than the correct hierarchical rule. These results suggest that human-like generalization from text alone requires stronger biases than the general sequence-processing biases of standard neural network architectures.
翻译:在句法习得过程中,儿童始终倾向于选择层级规则而非竞争性的非层级规则。这种偏好源于对层级结构的学习偏倚,还是源于与儿童语言输入中层级线索相互作用的更普遍偏倚?我们通过训练LSTM和Transformer两种不具备层级偏倚的神经网络模型来探索这些问题——训练数据在数量与内容上均接近儿童语言输入:来自CHILDES语料库的文本。随后评估这些模型对英语是非问句(一种层级结构至关重要的语言现象)的学习效果。研究发现,尽管两种模型在捕捉儿童导向语言表层统计特征(以困惑度衡量)方面表现良好,但其泛化方式更符合错误的线性规则而非正确的层级规则。这些结果表明,仅凭文本实现类人泛化,需要比标准神经网络架构的通用序列处理偏倚更强的偏倚机制。