Word class representations spontaneously emerge in a deep neural network trained on next word prediction

How do humans learn language, and can the first language be learned at all? These fundamental questions are still hotly debated. In contemporary linguistics, there are two major schools of thought that give completely opposite answers. According to Chomsky's theory of universal grammar, language cannot be learned because children are not exposed to sufficient data in their linguistic environment. In contrast, usage-based models of language assume a profound relationship between language structure and language use. In particular, contextual mental processing and mental representations are assumed to have the cognitive capacity to capture the complexity of actual language use at all levels. The prime example is syntax, i.e., the rules by which words are assembled into larger units such as sentences. Typically, syntactic rules are expressed as sequences of word classes. However, it remains unclear whether word classes are innate, as implied by universal grammar, or whether they emerge during language acquisition, as suggested by usage-based approaches. Here, we address this issue from a machine learning and natural language processing perspective. In particular, we trained an artificial deep neural network on predicting the next word, provided sequences of consecutive words as input. Subsequently, we analyzed the emerging activation patterns in the hidden layers of the neural network. Strikingly, we find that the internal representations of nine-word input sequences cluster according to the word class of the tenth word to be predicted as output, even though the neural network did not receive any explicit information about syntactic rules or word classes during training. This surprising result suggests, that also in the human brain, abstract representational categories such as word classes may naturally emerge as a consequence of predictive coding and processing during language acquisition.

翻译：人类如何习得语言，以及第一语言是否可能被习得？这些根本性问题至今仍在激烈争论中。当代语言学两大流派对此给出了截然相反的答案。根据乔姆斯基的普遍语法理论，语言不可习得，因为儿童在语言环境中接触不到足够的数据。与之相对，基于使用的语言模型则假定语言结构与语言使用之间存在深刻关联，特别地，语境心智加工与心智表征被认为具有捕捉实际语言使用各层面复杂性的认知能力。句法——即词语组合成更大单元（如句子）的规则——便是典型例证。通常，句法规则被表述为词类序列。然而，词类究竟是普遍语法所暗示的先天存在，还是基于使用理论所主张的语言习得过程中涌现的结果，至今仍不明确。本研究从机器学习和自然语言处理视角探讨该问题。具体而言，我们训练了一个人工深度神经网络，以连续词序列为输入预测下一个词。随后，我们分析了该网络隐藏层中涌现的激活模式。引人注目的是，我们发现九个词输入序列的内部表征会依据待预测的第十个词的词类而聚类，尽管该神经网络在训练过程中从未接收过任何关于句法规则或词类的显式信息。这一惊人结果表明，在人脑中，诸如词类这样的抽象表征类别也可能作为语言习得过程中预测编码与加工的副产物自然涌现。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

59+阅读 · 2020年1月25日