From Frege to chatGPT: Compositionality in language, cognition, and deep neural networks

Compositionality has long been considered a key explanatory property underlying human intelligence: arbitrary concepts can be composed into novel complex combinations, permitting the acquisition of an open ended, potentially infinite expressive capacity from finite learning experiences. Influential arguments have held that neural networks fail to explain this aspect of behavior, leading many to dismiss them as viable models of human cognition. Over the last decade, however, modern deep neural networks (DNNs), which share the same fundamental design principles as their predecessors, have come to dominate artificial intelligence, exhibiting the most advanced cognitive behaviors ever demonstrated in machines. In particular, large language models (LLMs), DNNs trained to predict the next word on a large corpus of text, have proven capable of sophisticated behaviors such as writing syntactically complex sentences without grammatical errors, producing cogent chains of reasoning, and even writing original computer programs -- all behaviors thought to require compositional processing. In this chapter, we survey recent empirical work from machine learning for a broad audience in philosophy, cognitive science, and neuroscience, situating recent breakthroughs within the broader context of philosophical arguments about compositionality. In particular, our review emphasizes two approaches to endowing neural networks with compositional generalization capabilities: (1) architectural inductive biases, and (2) metalearning, or learning to learn. We also present findings suggesting that LLM pretraining can be understood as a kind of metalearning, and can thereby equip DNNs with compositional generalization abilities in a similar way. We conclude by discussing the implications that these findings may have for the study of compositionality in human cognition and by suggesting avenues for future research.

翻译：组合性长期以来被认为是解释人类智能的关键属性：任意概念可以组合成新颖的复杂结构，使得从有限的学习经验中获得开放且可能无限的表达能力成为可能。一些具有影响力的论点认为神经网络无法解释行为的这一方面，导致许多研究者将其排除为人类认知的可行模型。然而在过去十年中，与现代深度神经网络（DNNs）——它们与早期网络共享相同的基本设计原则——已主导人工智能领域，展现出机器中前所未有的高级认知行为。特别是大型语言模型（LLMs），即通过在大规模文本语料上预测下一个词训练的DNNs，已证明能够完成复杂行为，例如生成句法复杂且无语法错误的句子、产生条理清晰的推理链，甚至编写原创计算机程序——这些行为曾被认为都需要组合性处理能力。在本章中，我们面向哲学、认知科学和神经科学领域的广大读者，综述机器学习领域的最新实证研究，将近期突破置于哲学关于组合性的更广泛论证背景中。特别地，我们的综述聚焦于赋予神经网络组合泛化能力的两种途径：（1）架构归纳偏置，以及（2）元学习（即学会学习）。我们还展示了相关研究结果，表明LLM预训练可被理解为一种元学习形式，从而能以类似方式使DNNs获得组合泛化能力。最后，我们讨论了这些发现对人类认知中组合性研究的启示，并提出了未来研究的潜在方向。

相关内容

Cognition

关注 4

Cognition：Cognition：International Journal of Cognitive Science Explanation：认知：国际认知科学杂志。 Publisher：Elsevier。 SIT： http://www.journals.elsevier.com/cognition/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

生成性对抗网络:理论模型、评估指标和最近发展的概述，Generative Adversarial Networks (GANs): An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments

专知会员服务

42+阅读 · 2020年5月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日