The capabilities of transformer networks such as ChatGPT and other Large Language Models (LLMs) have captured the world's attention. The crucial computational mechanism underlying their performance relies on transforming a complete input sequence - for example, all the words in a sentence into a long "encoding vector" - that allows transformers to learn long-range temporal dependencies in naturalistic sequences. Specifically, "self-attention" applied to this encoding vector enhances temporal context in transformers by computing associations between pairs of words in the input sequence. We suggest that waves of neural activity, traveling across single cortical regions or across multiple regions at the whole-brain scale, could implement a similar encoding principle. By encapsulating recent input history into a single spatial pattern at each moment in time, cortical waves may enable temporal context to be extracted from sequences of sensory inputs, the same computational principle used in transformers.
翻译:Transformer网络(如ChatGPT等大型语言模型)的强大能力已引起全球关注。其性能关键计算机制在于将完整的输入序列(例如,将句子中的所有词汇)转化为长"编码向量",从而使Transformer能够学习自然序列中的长程时间依赖关系。具体而言,应用于该编码向量的"自注意力"机制通过计算输入序列中词对间的关联来增强时间上下文。我们提出,跨单个皮层区域或全脑尺度多区域传播的神经活动波可能实现了相似的编码原理。通过将近期输入历史在每个时刻封装为单一空间模式,皮层波能够从感觉输入序列中提取时间上下文——这正是Transformer所采用的计算原理。