The capabilities of transformer networks such as ChatGPT and other Large Language Models (LLMs) have captured the world's attention. The crucial computational mechanism underlying their performance relies on transforming a complete input sequence - for example, all the words in a sentence - into a long "encoding vector" that allows transformers to learn long-range temporal dependencies in naturalistic sequences. Specifically, "self-attention" applied to this encoding vector enhances temporal context in transformers by computing associations between pairs of words in the input sequence. We suggest that waves of neural activity traveling across single cortical areas or multiple regions at the whole-brain scale could implement a similar encoding principle. By encapsulating recent input history into a single spatial pattern at each moment in time, cortical waves may enable temporal context to be extracted from sequences of sensory inputs, the same computational principle used in transformers.
翻译:以ChatGPT为代表的大型语言模型(Transformer网络)的性能已引起全球关注。其关键计算机制依赖于将完整输入序列(例如句子中的所有词汇)转换为长“编码向量”,使Transformer能够学习自然序列中的长程时间依赖性。具体而言,应用于该编码向量的“自注意力”机制通过计算输入序列中词汇对之间的关联性,增强了Transformer的时间上下文处理能力。我们认为,在单皮层区域或全脑尺度多脑区传播的神经活动波可能实现了类似的编码原理。通过将近期输入历史封装为每个时刻的空间模式,皮层波或能从感觉输入序列中提取时间上下文——这与Transformer采用的计算原理具有同构性。