In this study, we introduce NeuralMatrix, a novel framework that enables the computation of versatile deep neural networks (DNNs) on a single general matrix multiplication (GEMM) accelerator. The proposed approach overcomes the specificity limitations of ASIC-based accelerators while achieving application-specific acceleration levels compared to general-purpose processors such as CPUs and GPUs. We address the challenges of mapping both linear and nonlinear operations in DNN computation to general matrix multiplications and the impact of using a GEMM accelerator on DNN inference accuracy. Extensive experiments are conducted on various DNN models from three popular categories (i.e., CNN, Transformers, and GNN) as illustrative backbone models. Our results demonstrate that DNNs suffer only up to a 2.02% accuracy loss after being converted to general matrix multiplication, while achieving 113x to 19.44x improvements in throughput per power compared to CPUs and GPUs.
翻译:在本研究中,我们提出NeuralMatrix这一新型框架,该框架能够将各类深度神经网络(DNN)在单一通用矩阵乘法(GEMM)加速器上进行计算。所提方法克服了基于ASIC的加速器特异性限制,同时相较于CPU和GPU等通用处理器,实现了特定应用的加速水平。我们解决了将DNN计算中的线性与非线性操作映射至通用矩阵乘法的挑战,并探讨了使用GEMM加速器对DNN推理精度的影响。针对来自三个主流类别(即CNN、Transformer和GNN)的代表性骨干模型,我们开展了广泛实验。结果表明,DNN在转换为通用矩阵乘法后,精度损失不超过2.02%,同时单位功耗吞吐量相较CPU和GPU提升了113倍至19.44倍。