During language acquisition, children successively learn to categorize phonemes, identify words, and combine them with syntax to form new meaning. While the development of this behavior is well characterized, we still lack a unifying computational framework to explain its underlying neural representations. Here, we investigate whether and when phonemic, lexical, and syntactic representations emerge in the activations of artificial neural networks during their training. Our results show that both speech- and text-based models follow a sequence of learning stages: during training, their neural activations successively build subspaces, where the geometry of the neural activations represents phonemic, lexical, and syntactic structure. While this developmental trajectory qualitatively relates to children's, it is quantitatively different: These algorithms indeed require two to four orders of magnitude more data for these neural representations to emerge. Together, these results show conditions under which major stages of language acquisition spontaneously emerge, and hence delineate a promising path to understand the computations underpinning language acquisition.
翻译:在语言习得过程中,儿童依次学会对音位进行分类、识别词汇,并通过句法组合词汇以构建新意义。尽管这种行为发展已得到充分描述,我们仍缺乏统一的计算框架来解释其背后的神经表征机制。本研究探讨了人工神经网络在训练过程中,其激活状态是否及何时涌现出音位、词汇和句法表征。实验结果表明,基于语音和文本的模型均遵循分阶段的学习序列:在训练过程中,其神经激活状态逐步构建出表征子空间,其中神经激活的几何结构分别对应音位、词汇和句法结构。虽然这种发展轨迹在质性与儿童语言习得相似,但在量化层面存在差异:这些算法需要多出2至4个数量级的数据量才能使神经表征得以涌现。综合而言,这些结果揭示了语言习得主要阶段自发涌现的条件,从而为理解语言习得背后的计算机制开辟了前景广阔的研究路径。