Recent advances in artificial neural networks for machine learning, and language modeling in particular, have established a family of recurrent neural network (RNN) architectures that, unlike conventional RNNs with vector-form hidden states, use two-dimensional (2D) matrix-form hidden states. Such 2D-state RNNs, known as Fast Weight Programmers (FWPs), can be interpreted as a neural network whose synaptic weights (called fast weights) dynamically change over time as a function of input observations, and serve as short-term memory storage; corresponding synaptic weight modifications are controlled or programmed by another network (the programmer) whose parameters are trained (e.g., by gradient descent). In this Primer, we review the technical foundations of FWPs, their computational characteristics, and their connections to transformers and state space models. We also discuss connections between FWPs and models of synaptic plasticity in the brain, suggesting a convergence of natural and artificial intelligence.
翻译:近年来,人工神经网络在机器学习(尤其是语言建模)领域取得了重要进展,其中出现了一类新型循环神经网络(RNN)架构。与传统采用向量形式隐藏状态的RNN不同,这类架构使用二维矩阵形式的隐藏状态。这种二维状态RNN被称为快速权重编程器(FWPs),可被解释为一种突触权重(称为快速权重)随时间动态变化的神经网络,其权重根据输入观测值进行函数化调整,并作为短期记忆存储介质;相应的突触权重修改由另一个网络(编程器)控制或编程,后者的参数通过梯度下降等方法进行训练。本入门文章系统回顾了FWPs的技术基础、计算特性及其与Transformer和状态空间模型的关联。同时探讨了FWPs与大脑突触可塑性模型之间的理论联系,揭示了自然智能与人工智能的融合趋势。