Large language models based on self-attention mechanisms have achieved astonishing performances not only in natural language itself, but also in a variety of tasks of different nature. However, regarding processing language, our human brain may not operate using the same principle. Then, a debate is established on the connection between brain computation and artificial self-supervision adopted in large language models. One of most influential hypothesis in brain computation is the predictive coding framework, which proposes to minimize the prediction error by local learning. However, the role of predictive coding and the associated credit assignment in language processing remains unknown. Here, we propose a mean-field learning model within the predictive coding framework, assuming that the synaptic weight of each connection follows a spike and slab distribution, and only the distribution is trained. This meta predictive learning is successfully validated on classifying handwritten digits where pixels are input to the network in sequence, and on the toy and real language corpus. Our model reveals that most of the connections become deterministic after learning, while the output connections have a higher level of variability. The performance of the resulting network ensemble changes continuously with data load, further improving with more training data, in analogy with the emergent behavior of large language models. Therefore, our model provides a starting point to investigate the physics and biology correspondences of the language processing and the unexpected general intelligence.
翻译:基于自注意力机制的大型语言模型不仅在自然语言处理本身,而且在各种不同性质的任务中都取得了惊人的表现。然而,在处理语言时,人脑可能并不遵循相同的原理。因此,关于大脑计算与大语言模型中采用的人工自监督学习之间的联系,学术界展开了辩论。大脑计算中最具影响力的假说之一是预测编码框架,该框架主张通过局部学习最小化预测误差。然而,预测编码及其相关的信用分配在语言处理中的作用仍不明确。在此,我们提出一个基于预测编码框架的平均场学习模型,假设每个连接的突触权重服从尖峰-板层分布,且仅对该分布进行训练。这种元预测学习在将像素按序列输入网络的手写数字分类任务中,以及在玩具数据集和真实语言语料库上均得到了成功验证。我们的模型表明,大多数连接在学习后变为确定性,而输出连接则具有更高的变异性。所得网络集成的性能随数据负载连续变化,并随着更多训练数据的加入进一步提升,这与大型语言模型的涌现行为类似。因此,我们的模型为探究语言处理及意外涌现的通用智能背后的物理和生物学对应关系提供了一个起点。