The attention mechanism requires huge computational efforts to process unnecessary calculations, significantly limiting the system's performance. Researchers propose sparse attention to convert some DDMM operations to SDDMM and SpMM operations. However, current sparse attention solutions introduce massive off-chip random memory access. We propose CPSAA, a novel crossbar-based PIM-featured sparse attention accelerator. First, we present a novel attention calculation mode. Second, we design a novel PIM-based sparsity pruning architecture. Finally, we present novel crossbar-based methods. Experimental results show that CPSAA has an average of 89.6X, 32.2X, 17.8X, 3.39X, and 3.84X performance improvement and 755.6X, 55.3X, 21.3X, 5.7X, and 4.9X energy-saving when compare with GPU, FPGA, SANGER, ReBERT, and ReTransformer.
翻译:注意力机制需处理大量不必要的计算,严重制约了系统性能。研究者提出稀疏注意力方法,将部分稠密矩阵乘法(DDMM)运算转化为稀疏-稠密矩阵乘法(SDDMM)与稀疏矩阵乘法(SpMM)运算。然而,现有稀疏注意力方案引入了大量片外随机访存。我们提出了CPSAA,一种新型交叉杆存内计算(PIM)架构的稀疏注意力加速器。首先,我们提出新颖的注意力计算模式;其次,设计新型基于PIM的稀疏性剪枝架构;最后,提出新颖的交叉杆操作方法。实验结果表明,与GPU、FPGA、SANGER、ReBERT和ReTransformer相比,CPSAA平均性能提升分别达89.6倍、32.2倍、17.8倍、3.39倍和3.84倍,能耗节省分别达755.6倍、55.3倍、21.3倍、5.7倍和4.9倍。