The inherent diversity of computation types within individual deep neural network (DNN) models necessitates a corresponding variety of computation units within hardware processors, leading to a significant constraint on computation efficiency during neural network execution. In this study, we introduce NeuralMatrix, a framework that transforms the computation of entire DNNs into linear matrix operations, effectively enabling their execution with one general-purpose matrix multiplication (GEMM) accelerator. By surmounting the constraints posed by the diverse computation types required by individual network models, this approach provides both generality, allowing a wide range of DNN models to be executed using a single GEMM accelerator and application-specific acceleration levels without extra special function units, which are validated through main stream DNNs and their variant models.
翻译:深度神经网络(DNN)模型内部固有的计算类型多样性,要求硬件处理器配备相应的多样化计算单元,这在神经网络执行过程中对计算效率造成了显著限制。在本研究中,我们提出NeuralMatrix框架,该框架将整个DNN的计算转换为线性矩阵运算,从而有效实现仅用一款通用矩阵乘法(GEMM)加速器即可执行这些运算。通过克服单个网络模型所需多样化计算类型带来的约束,该方法同时实现了通用性——使多种DNN模型能够通过单个GEMM加速器执行,以及无需额外专用功能单元即可获得的特定应用加速水平。这些能力已通过主流DNN及其变体模型得到验证。