Quantifying the Roles of Visual, Linguistic, and Visual-Linguistic Complexity in Verb Acquisition

Children typically learn the meanings of nouns earlier than the meanings of verbs. However, it is unclear whether this asymmetry is a result of complexity in the visual structure of categories in the world to which language refers, the structure of language itself, or the interplay between the two sources of information. We quantitatively test these three hypotheses regarding early verb learning by employing visual and linguistic representations of words sourced from large-scale pre-trained artificial neural networks. Examining the structure of both visual and linguistic embedding spaces, we find, first, that the representation of verbs is generally more variable and less discriminable within domain than the representation of nouns. Second, we find that if only one learning instance per category is available, visual and linguistic representations are less well aligned in the verb system than in the noun system. However, in parallel with the course of human language development, if multiple learning instances per category are available, visual and linguistic representations become almost as well aligned in the verb system as in the noun system. Third, we compare the relative contributions of factors that may predict learning difficulty for individual words. A regression analysis reveals that visual variability is the strongest factor that internally drives verb learning, followed by visual-linguistic alignment and linguistic variability. Based on these results, we conclude that verb acquisition is influenced by all three sources of complexity, but that the variability of visual structure poses the most significant challenge for verb learning.

翻译：儿童通常先习得名词含义，后掌握动词含义。然而，这种不对称性源于语言所指外部世界类别的视觉结构复杂性、语言自身结构的复杂性，还是这两类信息相互作用的复杂性，目前尚不明确。本研究通过采用大规模预训练人工神经网络中提取的词语视觉与语言表征，对早期动词学习的三种假说进行定量检验。对视觉和语言嵌入空间结构进行分析后发现：首先，动词表征在领域内通常比名词表征更具可变性且可区分性更低；其次，若每个类别仅有一个学习实例，动词系统中的视觉与语言表征对齐程度低于名词系统，但若每个类别存在多个学习实例（这与人类语言发展过程平行），动词系统中的视觉与语言表征对齐程度可接近名词系统；第三，我们比较了可能预测单个词语学习难度的因素的相对贡献。回归分析表明，视觉可变性是驱动动词学习的内部最强因素，其次为视觉-语言对齐度和语言可变性。基于上述结果，我们得出结论：动词习得受所有三种来源的复杂性影响，其中视觉结构的可变性对动词学习构成最显著的挑战。