Autoregressive decoder-only transformers have become key components for scalable sequence processing and generation models. However, the transformer's self-attention mechanism requires transferring prior token projections from the main memory at each time step (token), thus severely limiting their performance on conventional processors. Self-attention can be viewed as a dynamic feed-forward layer, whose matrix is input sequence-dependent similarly to the result of local synaptic plasticity. Using this insight, we present a neuromorphic decoder-only transformer model that utilizes an on-chip plasticity processor to compute self-attention. Interestingly, the training of transformers enables them to ``learn'' the input context during inference. We demonstrate this in-context learning ability of transformers on the Loihi 2 processor by solving a few-shot classification problem. With this we emphasize the importance of pretrained models especially their ability to find simple, local, backpropagation free, learning rules enabling on-chip learning and adaptation in a hardware friendly manner.
翻译:自回归解码器仅Transformer已成为可扩展序列处理与生成模型的核心组件。然而,Transformer的自注意力机制需要在每个时间步(词元)从主存储器传输先前的词元投影,这严重限制了其在传统处理器上的性能。自注意力可被视为动态前馈层,其矩阵与输入序列相关,类似于局部突触可塑性的结果。基于这一洞见,我们提出了一种神经形态解码器仅Transformer模型,该模型利用片上可塑性处理器计算自注意力。有趣的是,Transformer的训练使其能够在推理过程中"学习"输入上下文。我们通过在Loihi 2处理器上解决少样本分类问题,展示了Transformer的这种上下文学习能力。由此我们强调了预训练模型的重要性,特别是其发现简单、局部、无需反向传播的学习规则的能力,从而以硬件友好的方式实现片上学习与适应。