Meta predictive learning model of languages in neural circuits

Large language models based on self-attention mechanisms have achieved astonishing performances not only in natural language itself, but also in a variety of tasks of different nature. However, regarding processing language, our human brain may not operate using the same principle. Then, a debate is established on the connection between brain computation and artificial self-supervision adopted in large language models. One of most influential hypothesis in brain computation is the predictive coding framework, which proposes to minimize the prediction error by local learning. However, the role of predictive coding and the associated credit assignment in language processing remains unknown. Here, we propose a mean-field learning model within the predictive coding framework, assuming that the synaptic weight of each connection follows a spike and slab distribution, and only the distribution, rather than specific weights, is trained. This meta predictive learning is successfully validated on classifying handwritten digits where pixels are input to the network in sequence, and moreover on the toy and real language corpus. Our model reveals that most of the connections become deterministic after learning, while the output connections have a higher level of variability. The performance of the resulting network ensemble changes continuously with data load, further improving with more training data, in analogy with the emergent behavior of large language models. Therefore, our model provides a starting point to investigate the connection among brain computation, next-token prediction and general intelligence.

翻译：基于自注意力机制的大型语言模型不仅在自然语言处理本身，而且在各种不同性质的任务中都取得了惊人成就。然而，在处理语言时，人脑可能并不遵循相同的运作原理。由此引发了关于大脑计算与大型语言模型所采用的人工自监督机制之间关联的争论。在大脑计算领域最具影响力的假说之一是预测编码框架，该框架主张通过局部学习来最小化预测误差。然而，预测编码及其相关的信用分配在语言处理中的作用仍不明确。本文在预测编码框架内提出了一种平均场学习模型，假设每个连接的突触权重服从尖峰-平板分布，且仅训练该分布而非具体权重值。该元预测学习模型在逐像素输入的手写数字分类任务中成功得到验证，并进一步在玩具数据集和真实语言语料库中取得良好效果。我们的模型揭示，学习后大多数连接逐渐趋于确定性，而输出连接则表现出更高的变异性。由此产生的网络集成性能随数据负载连续变化，并随着训练数据增多而持续提升，这与大型语言模型的涌现行为相似。因此，本模型为探索大脑计算、下一词元预测与通用智能之间的联系提供了起点。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【机器学习术语宝典】机器学习中英文术语表

专知会员服务

61+阅读 · 2020年7月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【AI应用】Facebook-利用神经网络求解高等数学方程, Using neural networks to solve advanced mathematics equations

专知会员服务

34+阅读 · 2020年1月15日