Spiking Neural Networks (SNNs) and transformers represent two powerful paradigms in neural computation, known for their low power consumption and ability to capture feature dependencies, respectively. However, transformer architectures typically involve multiple types of computational layers, including linear layers for MLP modules and classification heads, convolution layers for tokenizers, and dot product computations for self-attention mechanisms. These diverse operations pose significant challenges for hardware accelerator design, and to our knowledge, there is not yet a hardware solution that leverages spike-form data from SNNs for transformer architectures. In this paper, we introduce VESTA, a novel hardware design that synergizes these technologies, presenting unified Processing Elements (PEs) capable of efficiently performing all three types of computations crucial to transformer structures. VESTA uniquely benefits from the spike-form outputs of the Spike Neuron Layers \cite{zhou2024spikformer}, simplifying multiplication operations by reducing them from handling two 8-bit integers to handling one 8-bit integer and a binary spike. This reduction enables the use of multiplexers in the PE module, significantly enhancing computational efficiency while maintaining the low-power advantage of SNNs. Experimental results show that the core area of VESTA is \(0.844 mm^2\). It operates at 500MHz and is capable of real-time image classification at 30 fps.
翻译:脉冲神经网络(SNNs)与Transformer架构分别代表了神经计算中低功耗与特征依赖捕获能力两大重要范式。然而,Transformer架构通常包含多种计算层类型,包括用于MLP模块和分类头的线性层、用于分词器的卷积层,以及用于自注意力机制的点积运算。这些多样化的操作对硬件加速器设计提出了重大挑战,且据我们所知,目前尚未有硬件解决方案能够利用SNNs的脉冲形式数据来支持Transformer架构。本文提出VESTA——一种融合上述技术的新型硬件设计,其采用统一处理单元(PEs),能够高效执行Transformer结构所需的所有三种关键计算类型。VESTA独特地受益于脉冲神经元层(Spike Neuron Layers)的脉冲形式输出\cite{zhou2024spikformer},通过将乘法操作从处理两个8位整数简化为处理一个8位整数与一个二进制脉冲,显著降低了计算复杂度。这种简化使得PE模块可采用多路复用器结构,在保持SNNs低功耗优势的同时大幅提升计算效率。实验结果表明,VESTA的核心面积为\(0.844 mm^2\),在500MHz工作频率下可实现30fps的实时图像分类。