The inherent diversity of computation types within individual Deep Neural Network (DNN) models imposes a corresponding need for a varied set of computation units within hardware processors. This diversity poses a significant constraint on computation efficiency during the execution of different neural networks. In this study, we present NeuralMatrix, a framework that transforms the computation of entire DNNs into linear matrix operations. This transformation seamlessly enables the execution of various DNN models using a single General-Purpose Matrix Multiplication (GEMM) accelerator. Extensive experimental results spanning different DNN models demonstrate that our approach preserves network accuracy while providing both generality and application-specific levels of computation efficiency. This allows a broad spectrum of DNN models to be executed using a single GEMM accelerator, eliminating the need for additional special function units.
翻译:深度神经网络模型内部计算类型的固有多样性对硬件处理器提出了配置多样化计算单元的需求。这种多样性在执行不同神经网络时对计算效率构成了显著限制。本研究提出NeuralMatrix框架,将整个深度神经网络的计算过程转化为线性矩阵运算。该转换方法使得各类深度神经网络模型能够通过单一通用矩阵乘法加速器实现无缝执行。涵盖不同深度神经网络模型的大量实验结果表明,本方法在保持网络精度的同时,既能保证通用性又能实现应用特定的计算效率。这使得广泛类型的深度神经网络模型可经由单一通用矩阵乘法加速器执行,无需额外特殊函数单元支持。